fix(fs): cap file list and abort untracked scan to prevent session init timeout on large repos#2819
Open
ricardo-leiva wants to merge 2 commits into
Open
fix(fs): cap file list and abort untracked scan to prevent session init timeout on large repos#2819ricardo-leiva wants to merge 2 commits into
ricardo-leiva wants to merge 2 commits into
Conversation
…init OOM Large repos (100k+ files, unignored __pycache__ / venvs) caused git ls-files --others to hang for minutes and allocate hundreds of MB in the workspace-server process. The resulting GC pressure starved the concurrent session initializationResult() promise, reliably hitting the 30s timeout when adding a large project. Two-part fix in FsService.listRepoFiles: - Abort git ls-files --others after 8 s via AbortController; fall back to [] - Cap combined tracked + untracked array at 50,000 entries before building the directory tree (avoids ~200MB+ allocation for very large repos) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01TpGjPYBD4pgZpsAHjozzsN
Contributor
Prompt To Fix All With AIFix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
packages/workspace-server/src/services/fs/service.test.ts:59-69
**Weak cap assertion may miss over-budget entries from derived directories**
The cap (`MAX_REPO_FILES = 50_000`) is applied to the raw file-path list before `deriveDirectories` adds extra entries. The test asserts `entries.length <= 50_000`, but `entries` is the combined files + derived directory entries returned by `listRepoFiles`. For the flat test input (`file${i}.ts`, no path separators) there are zero derived directories, so `entries.length` equals exactly 50,000 — but if the test used files with nested paths, `entries.length` could legitimately exceed 50,000 even with the cap in place.
Using `toBe(50_000)` for the current flat input would make the assertion precise and document the expected count, while a follow-up test with nested paths would confirm that the entries total can exceed the cap when directories are derived.
Reviews (1): Last reviewed commit: "fix(fs): cap file list and timeout untra..." | Re-trigger Greptile |
… review - Use toBe(50_000) for flat input — zero derived directories means total === cap - Add test documenting that entries total can exceed 50k when nested paths produce extra directory entries via deriveDirectories (cap is on raw files, not on the final files+dirs result) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01TpGjPYBD4pgZpsAHjozzsN
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When adding a large project (e.g. a Django monorepo with 100k+ tracked files or unignored
__pycache__/venv/staticfilesdirectories), the session initialization consistently times out with:The root cause is
FsService.listRepoFiles, which runsgit ls-files --others --exclude-standard(untracked files) unbounded and concurrent with session startup. For repos with large untracked trees, this call can return millions of paths and hold the workspace-server event loop busy allocating hundreds of MB of JavaScript strings and objects — starving theinitializationResult()IPC promise past the 30 s deadline.Changes
packages/workspace-server/src/services/fs/service.tslistAllFiles()call with a privatefetchAllFiles()helper that:listFiles(tracked) andlistUntrackedFiles(untracked) in parallel, as beforeAbortControlleron the untracked scan; falls back to[]on timeout or error — tracked files always show upArray.lengthtruncation, no copy) before building the directory tree, bounding peak allocation regardless of repo sizepackages/workspace-server/src/services/fs/service.test.tslistAllFilestolistFiles+listUntrackedFiles(now mocked separately, matching the new implementation)Screenshots
How did you test this?
FsServiceunit tests pass (vitest run)biome lint— no fixes applied)node scripts/check-host-boundaries.mjs)Automatic notifications