Skip to content

Double compaction for completed partitioning job #7257

@anna-tran

Description

@anna-tran

Describe the bug
When running distributed Cortex with multiple compactors, the partitioning compactor creates duplicate blocks for a partition group because the cleaner deletes the visit markers before the partition info file is deleted (a side effect of this PR #7209).

Scenario:
Compactor A completes a partitioning job 0.
Compactor A cleaner sees the partition plan is complete and marks blocks for deletion, and deletes visit markers. It waits until next cleaning cycle to delete partition group info file.

Compactor B in the grouping phase sees that the partition group info file exists and does not see a visit marker for partitioning job 0.
After the addition of this PR #7156, blocks marked for deletion are no longer filtered out in the compaction phase.
Compactor B sees no issue and performs a duplicate compaction for that job.

To Reproduce
Steps to reproduce the behavior:

  1. Start Cortex in a distributed environment on EKS with multiple compactors running.
  2. Configure the cleaner interval to be 15 min and compaction interval to be 1 min.
  3. Observe that some compactors will start compaction of completed partitioning jobs after the visit markers have been deleted but before the partitioning group file is deleted.
  4. Compare the result block meta.json files for duplicate compactions and see that they are the same aside from the result block ID.

Expected behavior
No duplicate compaction should happen for a completed partitioned job.

Environment:

  • Infrastructure: EKS
  • Deployment tool: Helm

Additional Context

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions