Skip to content

refactor(debug-files): scan paths via shared walkFiles walker#1150

Merged
BYK merged 2 commits into
mainfrom
byk/refactor/debug-files-shared-walker
Jul 2, 2026
Merged

refactor(debug-files): scan paths via shared walkFiles walker#1150
BYK merged 2 commits into
mainfrom
byk/refactor/debug-files-shared-walker

Conversation

@BYK

@BYK BYK commented Jun 26, 2026

Copy link
Copy Markdown
Member

What

Replace the bespoke recursive directory walker in src/lib/dif/scan.ts
(collectFiles) with the project's shared, well-tested walkFiles walker
(src/lib/scan/) — the same traversal foundation the DSN code-scanner uses.

This is a preliminary, self-contained refactor that the upcoming
debug-files upload WASM-consumption features (embedded PPDB extraction,
--il2cpp-mapping) will build on. It has no behavior change for users.

Why

walkFiles gives parallel readdir, dev:ino-based symlink-cycle detection
(sturdier than the previous visited-realpath set), AbortSignal support, and
far more test coverage than the hand-rolled walker.

Behavior preservation

walkFiles is policy-free, so its DSN-tuned defaults are explicitly overridden
to match the existing debug-file scanning contract — debug files are large
binaries that routinely live in gitignored build output (build/,
target/, Xcode DerivedData):

Option Value Why
respectGitignore false debug files live in gitignored dirs
alwaysSkipDirs [] default skip-dirs are build-output dirs — exactly where DIFs are
maxFileSize Infinity default is 256 KB; DIFs are MBs. Size gating stays in prepareDifs (format-before-size, drives oversizedCount)
followSymlinks true matches previous cycle-safe behavior
classifyBinary false see below

The deliberate format-before-size ordering that drives
PrepareDifsResult.oversizedCount is untouched and remains in prepareDifs.

New shared-walker option: classifyBinary

Adds classifyBinary?: boolean (default true, preserving current behavior) to
WalkOptions. When false, the walker skips the per-file ~8 KB NUL sniff used
to populate WalkEntry.isBinary and reports isBinary: false without reading
file contents. The DIF scanner ignores isBinary, so it opts out to avoid an
extra head-read per file on large binary trees. Also benefits other binary-tree
consumers (e.g. sourcemap discovery).

Tests

  • test/lib/scan/walker.test.ts: classifyBinary: false yields a NUL blob with
    isBinary: false while the default still reports true (proves the sniff is
    skipped).
  • test/lib/dif/scan.test.ts: new scanPaths traversal coverage — recursive
    discovery, explicit-file passthrough, files under .gitignore'd /
    node_modules dirs are still found
    (regression guard for the swap),
    overlapping-arg dedupe, symlink-cycle safety, and ENOENT → ValidationError.

Verification

pnpm run typecheck, pnpm run lint, pnpm run check:deps clean;
test/lib/scan + test/lib/dif suites pass (99 tests).

@github-actions github-actions Bot added the risk: medium PR risk score: medium label Jun 26, 2026
@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor
PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://cli.sentry.dev/_preview/pr-1150/

Built to branch gh-pages at 2026-07-02 12:26 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@cursor cursor Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit ba2848f. Configure here.

Comment thread src/lib/dif/scan.ts
@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Codecov Results 📊

✅ Patch coverage is 90.00%. Project has 5132 uncovered lines.
✅ Project coverage is 81.54%. Comparing base (base) to head (head).

Files with missing lines (2)
File Patch % Lines
src/lib/dif/scan.ts 87.50% ⚠️ 3 Missing and 2 partials
src/lib/scan/walker.ts 100.00% ⚠️ 1 partials
Coverage diff
@@            Coverage Diff             @@
##          main       #PR       +/-##
==========================================
+ Coverage    81.51%    81.54%    +0.03%
==========================================
  Files          399       399         —
  Lines        27811     27807        -4
  Branches     18065     18073        +8
==========================================
+ Hits         22669     22675        +6
- Misses        5142      5132       -10
- Partials      1863      1864        +1

Generated by Codecov Action

BYK added a commit that referenced this pull request Jul 2, 2026
…1163)

## What

`sentry debug-files upload` now automatically extracts a **Portable PDB
embedded inside a managed .NET PE assembly** (debug directory entry type
17, deflate-compressed) and uploads it as a separate `<name>.pdb` debug
file — faithful to legacy `sentry-cli`'s `extract_embedded_ppdb`.

Consumes the new WASM API from **`@sentry/symbolic@13.7.0`** (bumped
from 13.5.0):
`ObjectFile.asPe()` → `PeFile.embeddedPpdb()` (shipped in
getsentry/symbolic#1004).

## Why the extraction lives at the *prepare* stage

A managed PE's debug info lives entirely in its embedded Portable PDB,
so the PE object itself typically reports **no** debug features and is
dropped by the feature filter. Extraction therefore happens in
`prepareDifs` (not in `buildDifList` over already-`prepared` files),
independent of whether the PE passed the filters — mirroring legacy,
which extracts from any PE with a debug id.

The standalone extracted PPDB bytes are then run through the **same
per-object filters** as any other file, so it:
- honors `--type portablepdb` (a PE excluded by the `pe`-format filter
is still read when `portablepdb` is wanted),
- honors feature and `--id` filters,
- is size-gated against the server's advertised max file size,
- dedupes normally (debug id + content hash),
- and works for PEs found inside scanned `.zip` archives.

Extraction/decompression errors are swallowed (logged at debug level) so
a malformed embed never aborts the scan.

## Tests

- `test/lib/dif/embedded-ppdb.test.ts` — unit tests for
`extractEmbeddedPpdb` (positive/negative, standalone-PPDB round-trip)
against small committed .NET PE fixtures.
- `test/commands/debug-files/upload.test.ts` — command-level: dry-run
queues the `.pdb`, `--type portablepdb` extracts it, `--type elf`
ignores it, PE-without-embed yields nothing, and the PPDB is threaded
through to `uploadDebugFiles`.

Fixtures (`embedded-ppdb.dll`, `pe-no-ppdb.dll`) are the small Sentry
sample console binaries from `symbolic-testutils`.

Verified locally: `typecheck`, `lint`, `test` (350 debug-files/dif/scan
tests), `check:deps`, `check:fragments` all green.

## Notes
- Part of the CLI ↔ symbolic WASM consumption series; sibling PR (IL2CPP
line mappings, `--il2cpp-mapping`) to follow.
- Soft-follows the shared-walker refactor (#1150) but touches different
functions (prepare stage), so it's independent.

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
BYK added 2 commits July 2, 2026 12:23
Replace the bespoke recursive directory walker in dif/scan.ts with the
project's shared, well-tested walkFiles walker (the DSN code-scanner's
traversal foundation): parallel readdir, inode-based symlink-cycle
detection, and far more test coverage.

walkFiles is policy-free, so its DSN-tuned defaults are overridden to
preserve debug-file scanning behavior: debug files are large binaries
that live in gitignored build output, so respectGitignore and the default
build-output skip-dirs are disabled and the 256 KB size cap is lifted
(size gating stays in prepareDifs to keep the format-before-size ordering
that drives oversizedCount).

Adds a classifyBinary opt-out to walkFiles (default true) so consumers
that ignore isBinary -- like this scanner -- skip the per-file 8 KB NUL
sniff.
Addresses a Cursor Bugbot finding on the walkFiles migration: the root
cwd frame is pushed directly rather than through maybeDescend (the only
place inodes are recorded), so a descendant symlink pointing back at the
scan root was followed once, re-listing the root subtree under a second
path prefix. The pre-walkFiles DIF scanner avoided this by recording the
root's realpath before reading children. Seed the root inode at walk
init when followSymlinks is enabled to restore that behavior; the default
(non-following) walk is unaffected.
@BYK BYK force-pushed the byk/refactor/debug-files-shared-walker branch from f59f315 to 470347e Compare July 2, 2026 12:25
@BYK BYK merged commit 47cffe8 into main Jul 2, 2026
32 checks passed
@BYK BYK deleted the byk/refactor/debug-files-shared-walker branch July 2, 2026 12:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

risk: medium PR risk score: medium

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant