Skip to content

fix(worker): Add tmp shard cleanup on indexing failure#805

Merged
brendan-kellam merged 5 commits intomainfrom
cursor/SOU-306-zoekt-failed-indexing-shards-c9df
Feb 7, 2026
Merged

fix(worker): Add tmp shard cleanup on indexing failure#805
brendan-kellam merged 5 commits intomainfrom
cursor/SOU-306-zoekt-failed-indexing-shards-c9df

Conversation

@msukkari
Copy link
Contributor

@msukkari msukkari commented Jan 28, 2026

Clean up temporary Zoekt shard files on indexing failure to prevent disk space exhaustion.

When zoekt-git-index fails during repository indexing, it leaves behind .tmp shard files. These accumulate over time, especially for repos that repeatedly fail to index, leading to disk space issues. This PR adds logic to automatically remove these temporary files immediately after an indexing operation fails.

Fixes #804

Summary by CodeRabbit

  • Bug Fixes
    • Fixed issue where temporary files created during index operations were not being cleaned up when failures occurred, which could accumulate and consume unnecessary disk space.

When zoekt-git-index fails during repository indexing, it can leave behind
.tmp shard files that accumulate over time and fill up disk space. This is
especially problematic for large repos that repeatedly fail to index.

Changes:
- Add cleanupTempShards() function to zoekt.ts that removes temporary shard
  files (files with .tmp in their name) for a specific repository
- Call cleanupTempShards() in repoIndexManager.ts when indexGitRepository
  fails, before re-throwing the error

This ensures that even if a repository consistently fails to index, the
temporary files created during each attempt are cleaned up.

Co-authored-by: michael <michael@sourcebot.dev>
@cursor
Copy link

cursor bot commented Jan 28, 2026

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
Learn more about Cursor Agents

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 28, 2026

Caution

Review failed

The pull request is closed.

Walkthrough

Adds a best-effort cleanup of temporary Zoekt shard files on repository indexing failure by introducing cleanupTempShards and invoking it from the indexing error path before rethrowing the error. Also updates CHANGELOG.

Changes

Cohort / File(s) Summary
Changelog
CHANGELOG.md
Adds note in Unreleased → Fixed about cleaning up temporary shard files created on index failure.
Indexing error handling
packages/backend/src/repoIndexManager.ts
Wraps indexGitRepository in try/catch; on error logs a warning, calls cleanupTempShards(repo), then rethrows. Adjusts imports (adds bullmq types).
Zoekt cleanup utility
packages/backend/src/zoekt.ts
Adds exported cleanupTempShards(repo: Repo) which lists INDEX_CACHE_DIR, filters shard files by shard prefix and .tmp, deletes matching files with rm(..., { force: true }), and logs results; errors are caught and logged (best-effort).

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Manager as RepoIndexManager
    participant Zoekt as zoekt/indexer
    participant FS as Filesystem

    Manager->>Zoekt: indexGitRepository(repo)
    alt success
        Zoekt-->>Manager: success
    else failure
        Zoekt-->>Manager: throws error
        Manager->>Zoekt: cleanupTempShards(repo)
        Zoekt->>FS: readdir(INDEX_CACHE_DIR)
        FS-->>Zoekt: list of files
        Zoekt->>FS: rm(matching .tmp shards, force: true)
        FS-->>Zoekt: deletion results
        Zoekt-->>Manager: cleanup result (logged)
        Manager-->>Manager: rethrow error
    end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly summarizes the main change: adding automatic cleanup of temporary shard files when indexing fails, which directly matches the changeset modifications to error handling and the new cleanup mechanism.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cursor/SOU-306-zoekt-failed-indexing-shards-c9df

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@brendan-kellam brendan-kellam marked this pull request as ready for review February 7, 2026 02:44
@claude
Copy link

claude bot commented Feb 7, 2026

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

@brendan-kellam brendan-kellam changed the title Zoekt failed indexing shards fix(worker): Add tmp shard cleanup on indexing failure Feb 7, 2026
@claude
Copy link

claude bot commented Feb 7, 2026

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

@brendan-kellam brendan-kellam merged commit 278172a into main Feb 7, 2026
8 of 9 checks passed
@brendan-kellam brendan-kellam deleted the cursor/SOU-306-zoekt-failed-indexing-shards-c9df branch February 7, 2026 02:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug] Left over indexing shards are filling up the disk slowly

3 participants