Skip to content

fix(fs): cap file list and abort untracked scan to prevent session init timeout on large repos#2819

Open
ricardo-leiva wants to merge 2 commits into
PostHog:mainfrom
ricardo-leiva:fix/session-init-timeout
Open

fix(fs): cap file list and abort untracked scan to prevent session init timeout on large repos#2819
ricardo-leiva wants to merge 2 commits into
PostHog:mainfrom
ricardo-leiva:fix/session-init-timeout

Conversation

@ricardo-leiva

@ricardo-leiva ricardo-leiva commented Jun 21, 2026

Copy link
Copy Markdown

Problem

When adding a large project (e.g. a Django monorepo with 100k+ tracked files or unignored __pycache__ / venv / staticfiles directories), the session initialization consistently times out with:

Session initialization timed out after 30000ms

The root cause is FsService.listRepoFiles, which runs git ls-files --others --exclude-standard (untracked files) unbounded and concurrent with session startup. For repos with large untracked trees, this call can return millions of paths and hold the workspace-server event loop busy allocating hundreds of MB of JavaScript strings and objects — starving the initializationResult() IPC promise past the 30 s deadline.

Changes

packages/workspace-server/src/services/fs/service.ts

  • Replaces the single listAllFiles() call with a private fetchAllFiles() helper that:
    • Runs listFiles (tracked) and listUntrackedFiles (untracked) in parallel, as before
    • Arms an 8-second AbortController on the untracked scan; falls back to [] on timeout or error — tracked files always show up
    • Caps the combined array at 50 000 entries (Array.length truncation, no copy) before building the directory tree, bounding peak allocation regardless of repo size

packages/workspace-server/src/services/fs/service.test.ts

  • Updates mocks from listAllFiles to listFiles + listUntrackedFiles (now mocked separately, matching the new implementation)
  • Adds two new test cases: cap enforcement at 50 k and graceful fallback when untracked scan is aborted

Screenshots

Before After
before1 before2 after

How did you test this?

  • Adding prompt: "assess repo architecture and give me a detailed report" doesn't fail with big repos (used posthog monorepo)
  • All 8 FsService unit tests pass (vitest run)
  • Biome lint: clean (biome lint — no fixes applied)
  • Host boundary check: no new violations (node scripts/check-host-boundaries.mjs)
  • Pre-commit hook: typecheck passed

Automatic notifications

  • Publish to changelog?
  • Alert Sales and Marketing teams?

…init OOM

Large repos (100k+ files, unignored __pycache__ / venvs) caused git ls-files
--others to hang for minutes and allocate hundreds of MB in the workspace-server
process. The resulting GC pressure starved the concurrent session initializationResult()
promise, reliably hitting the 30s timeout when adding a large project.

Two-part fix in FsService.listRepoFiles:
- Abort git ls-files --others after 8 s via AbortController; fall back to []
- Cap combined tracked + untracked array at 50,000 entries before building
  the directory tree (avoids ~200MB+ allocation for very large repos)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01TpGjPYBD4pgZpsAHjozzsN
@greptile-apps

greptile-apps Bot commented Jun 21, 2026

Copy link
Copy Markdown
Contributor
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
packages/workspace-server/src/services/fs/service.test.ts:59-69
**Weak cap assertion may miss over-budget entries from derived directories**

The cap (`MAX_REPO_FILES = 50_000`) is applied to the raw file-path list before `deriveDirectories` adds extra entries. The test asserts `entries.length <= 50_000`, but `entries` is the combined files + derived directory entries returned by `listRepoFiles`. For the flat test input (`file${i}.ts`, no path separators) there are zero derived directories, so `entries.length` equals exactly 50,000 — but if the test used files with nested paths, `entries.length` could legitimately exceed 50,000 even with the cap in place.

Using `toBe(50_000)` for the current flat input would make the assertion precise and document the expected count, while a follow-up test with nested paths would confirm that the entries total can exceed the cap when directories are derived.

Reviews (1): Last reviewed commit: "fix(fs): cap file list and timeout untra..." | Re-trigger Greptile

Comment thread packages/workspace-server/src/services/fs/service.test.ts
… review

- Use toBe(50_000) for flat input — zero derived directories means total === cap
- Add test documenting that entries total can exceed 50k when nested paths
  produce extra directory entries via deriveDirectories (cap is on raw files,
  not on the final files+dirs result)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01TpGjPYBD4pgZpsAHjozzsN
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant