-
Notifications
You must be signed in to change notification settings - Fork 849
Description
Describe the bug
When running distributed Cortex with multiple compactors, the partitioning compactor creates duplicate blocks for a partition group because the cleaner deletes the visit markers before the partition info file is deleted (a side effect of this PR #7209).
Scenario:
Compactor A completes a partitioning job 0.
Compactor A cleaner sees the partition plan is complete and marks blocks for deletion, and deletes visit markers. It waits until next cleaning cycle to delete partition group info file.
Compactor B in the grouping phase sees that the partition group info file exists and does not see a visit marker for partitioning job 0.
After the addition of this PR #7156, blocks marked for deletion are no longer filtered out in the compaction phase.
Compactor B sees no issue and performs a duplicate compaction for that job.
To Reproduce
Steps to reproduce the behavior:
- Start Cortex in a distributed environment on EKS with multiple compactors running.
- Configure the cleaner interval to be 15 min and compaction interval to be 1 min.
- Observe that some compactors will start compaction of completed partitioning jobs after the visit markers have been deleted but before the partitioning group file is deleted.
- Compare the result block
meta.jsonfiles for duplicate compactions and see that they are the same aside from the result block ID.
Expected behavior
No duplicate compaction should happen for a completed partitioned job.
Environment:
- Infrastructure: EKS
- Deployment tool: Helm
Additional Context