refactor(debug-files): scan paths via shared walkFiles walker#1150
Merged
Conversation
Contributor
|
Contributor
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit ba2848f. Configure here.
Contributor
Codecov Results 📊✅ Patch coverage is 90.00%. Project has 5132 uncovered lines. Files with missing lines (2)
Coverage diff@@ Coverage Diff @@
## main #PR +/-##
==========================================
+ Coverage 81.51% 81.54% +0.03%
==========================================
Files 399 399 —
Lines 27811 27807 -4
Branches 18065 18073 +8
==========================================
+ Hits 22669 22675 +6
- Misses 5142 5132 -10
- Partials 1863 1864 +1Generated by Codecov Action |
BYK
added a commit
that referenced
this pull request
Jul 2, 2026
…1163) ## What `sentry debug-files upload` now automatically extracts a **Portable PDB embedded inside a managed .NET PE assembly** (debug directory entry type 17, deflate-compressed) and uploads it as a separate `<name>.pdb` debug file — faithful to legacy `sentry-cli`'s `extract_embedded_ppdb`. Consumes the new WASM API from **`@sentry/symbolic@13.7.0`** (bumped from 13.5.0): `ObjectFile.asPe()` → `PeFile.embeddedPpdb()` (shipped in getsentry/symbolic#1004). ## Why the extraction lives at the *prepare* stage A managed PE's debug info lives entirely in its embedded Portable PDB, so the PE object itself typically reports **no** debug features and is dropped by the feature filter. Extraction therefore happens in `prepareDifs` (not in `buildDifList` over already-`prepared` files), independent of whether the PE passed the filters — mirroring legacy, which extracts from any PE with a debug id. The standalone extracted PPDB bytes are then run through the **same per-object filters** as any other file, so it: - honors `--type portablepdb` (a PE excluded by the `pe`-format filter is still read when `portablepdb` is wanted), - honors feature and `--id` filters, - is size-gated against the server's advertised max file size, - dedupes normally (debug id + content hash), - and works for PEs found inside scanned `.zip` archives. Extraction/decompression errors are swallowed (logged at debug level) so a malformed embed never aborts the scan. ## Tests - `test/lib/dif/embedded-ppdb.test.ts` — unit tests for `extractEmbeddedPpdb` (positive/negative, standalone-PPDB round-trip) against small committed .NET PE fixtures. - `test/commands/debug-files/upload.test.ts` — command-level: dry-run queues the `.pdb`, `--type portablepdb` extracts it, `--type elf` ignores it, PE-without-embed yields nothing, and the PPDB is threaded through to `uploadDebugFiles`. Fixtures (`embedded-ppdb.dll`, `pe-no-ppdb.dll`) are the small Sentry sample console binaries from `symbolic-testutils`. Verified locally: `typecheck`, `lint`, `test` (350 debug-files/dif/scan tests), `check:deps`, `check:fragments` all green. ## Notes - Part of the CLI ↔ symbolic WASM consumption series; sibling PR (IL2CPP line mappings, `--il2cpp-mapping`) to follow. - Soft-follows the shared-walker refactor (#1150) but touches different functions (prepare stage), so it's independent. --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Replace the bespoke recursive directory walker in dif/scan.ts with the project's shared, well-tested walkFiles walker (the DSN code-scanner's traversal foundation): parallel readdir, inode-based symlink-cycle detection, and far more test coverage. walkFiles is policy-free, so its DSN-tuned defaults are overridden to preserve debug-file scanning behavior: debug files are large binaries that live in gitignored build output, so respectGitignore and the default build-output skip-dirs are disabled and the 256 KB size cap is lifted (size gating stays in prepareDifs to keep the format-before-size ordering that drives oversizedCount). Adds a classifyBinary opt-out to walkFiles (default true) so consumers that ignore isBinary -- like this scanner -- skip the per-file 8 KB NUL sniff.
Addresses a Cursor Bugbot finding on the walkFiles migration: the root cwd frame is pushed directly rather than through maybeDescend (the only place inodes are recorded), so a descendant symlink pointing back at the scan root was followed once, re-listing the root subtree under a second path prefix. The pre-walkFiles DIF scanner avoided this by recording the root's realpath before reading children. Seed the root inode at walk init when followSymlinks is enabled to restore that behavior; the default (non-following) walk is unaffected.
f59f315 to
470347e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

What
Replace the bespoke recursive directory walker in
src/lib/dif/scan.ts(
collectFiles) with the project's shared, well-testedwalkFileswalker(
src/lib/scan/) — the same traversal foundation the DSN code-scanner uses.This is a preliminary, self-contained refactor that the upcoming
debug-files uploadWASM-consumption features (embedded PPDB extraction,--il2cpp-mapping) will build on. It has no behavior change for users.Why
walkFilesgives parallelreaddir,dev:ino-based symlink-cycle detection(sturdier than the previous visited-
realpathset),AbortSignalsupport, andfar more test coverage than the hand-rolled walker.
Behavior preservation
walkFilesis policy-free, so its DSN-tuned defaults are explicitly overriddento match the existing debug-file scanning contract — debug files are large
binaries that routinely live in gitignored build output (
build/,target/, XcodeDerivedData):respectGitignorefalsealwaysSkipDirs[]maxFileSizeInfinityprepareDifs(format-before-size, drivesoversizedCount)followSymlinkstrueclassifyBinaryfalseThe deliberate format-before-size ordering that drives
PrepareDifsResult.oversizedCountis untouched and remains inprepareDifs.New shared-walker option:
classifyBinaryAdds
classifyBinary?: boolean(defaulttrue, preserving current behavior) toWalkOptions. Whenfalse, the walker skips the per-file ~8 KB NUL sniff usedto populate
WalkEntry.isBinaryand reportsisBinary: falsewithout readingfile contents. The DIF scanner ignores
isBinary, so it opts out to avoid anextra head-read per file on large binary trees. Also benefits other binary-tree
consumers (e.g. sourcemap discovery).
Tests
test/lib/scan/walker.test.ts:classifyBinary: falseyields a NUL blob withisBinary: falsewhile the default still reportstrue(proves the sniff isskipped).
test/lib/dif/scan.test.ts: newscanPathstraversal coverage — recursivediscovery, explicit-file passthrough, files under
.gitignore'd /node_modulesdirs are still found (regression guard for the swap),overlapping-arg dedupe, symlink-cycle safety, and ENOENT →
ValidationError.Verification
pnpm run typecheck,pnpm run lint,pnpm run check:depsclean;test/lib/scan+test/lib/difsuites pass (99 tests).