feat(debug-files): extract embedded Portable PDBs from managed PEs#1163
Conversation
Managed .NET PE assemblies can carry their Portable PDB debug companion inline (debug directory entry type 17, deflate-compressed). `debug-files upload` now extracts that PPDB during scanning and uploads it as a separate `<name>.pdb` DIF, faithful to legacy sentry-cli's `extract_embedded_ppdb`. A managed PE's debug info lives entirely in the embedded PPDB, so the PE object itself usually has no debug features and is dropped by the feature filter. Extraction therefore happens at the prepare stage, independent of the PE's own filtering; the standalone PPDB bytes then flow through the normal per-object filters, so it honors `--type portablepdb`, feature, and `--id` filters and is size-gated against the server's max file size. Consumes `ObjectFile.asPe()` / `PeFile.embeddedPpdb()` from @sentry/symbolic 13.7.0.
|
Codecov Results 📊✅ Patch coverage is 85.00%. Project has 5142 uncovered lines. Files with missing lines (2)
Coverage diff@@ Coverage Diff @@
## main #PR +/-##
==========================================
+ Coverage 81.49% 81.49% —%
==========================================
Files 399 399 —
Lines 27744 27787 +43
Branches 18025 18049 +24
==========================================
+ Hits 22609 22645 +36
- Misses 5135 5142 +7
- Partials 1859 1861 +2Generated by Codecov Action |
…sized PPDBs - Cheap-peek the buffer header and only attempt embedded-PPDB extraction for `pe` images, so non-PE files (ELF/Mach-O/Breakpad/...) no longer pay a full second parse on every scanned file. - When an extracted Portable PDB of a requested type exceeds the max file size, surface it through the oversized count so the command exits non-zero with a size message instead of a misleading "no debug files found" (exit 0).
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 07eed3d. Configure here.
| `Skipping ${path} for embedded PPDB extraction: size ${peeked.size} exceeds maximum file size ${maxFileSize}` | ||
| ); | ||
| return { dif: null, oversized: true }; | ||
| return { difs: [], oversized: false }; |
There was a problem hiding this comment.
Oversized PE skips embedded PPDB
Medium Severity
In prepareFileDif, when a PE’s on-disk size exceeds maxFileSize, the scan returns before reading the file, so embeddedPpdbDif never runs. The PR treats the decompressed Portable PDB as the upload payload and size-gates that blob separately; a large assembly with a small embedded PDB is dropped even though the extracted .pdb would be within the limit.
Reviewed by Cursor Bugbot for commit 07eed3d. Configure here.
There was a problem hiding this comment.
Intentional tradeoff, declining. The scanner deliberately never buffers a file that exceeds maxFileSize (see the size gate here and in the ZIP path) — this reader loads whole files into memory rather than memory-mapping them like the legacy Rust CLI, so that invariant is what bounds peak memory across a scan.
Reading an over-cap PE purely to extract a smaller embedded PPDB would break that invariant for every over-cap PE encountered (including large native PEs that carry no PPDB), which is exactly what the gate exists to prevent. In practice maxFileSize is the server-advertised cap (typically ≥1–2 GiB), and a managed .NET assembly that large does not occur, so the embedded PPDB of an over-cap PE is a case we accept skipping. The extracted PPDB is still independently size-gated on the normal (in-cap PE) path.
If a low-cap self-hosted configuration ever makes this matter, the right follow-up is a separate, bounded read budget for PPDB-only extraction — tracked as a follow-up rather than bundled here.
#1165) > Follows #1163 (embedded Portable PDB extraction), now merged. This PR is rebased onto `main`. (Re-opened from #1164, which GitHub auto-closed when its base branch was deleted on #1163's merge.) ## What Adds `sentry debug-files upload --il2cpp-mapping`: for each prepared debug file it computes a **Unity IL2CPP C++→C# line mapping** (from the file's primary debug-info object) and uploads it as a separate `il2cpp` DIF carrying the object's debug id — faithful to legacy `sentry-cli`'s `create_il2cpp_mappings`. Consumes the new WASM API from **`@sentry/symbolic@13.7.0`** (getsentry/symbolic#1005): - top-level `il2cppLineMapping(object, provider)` - `SourceBundleWriter.collectIl2cppSources` (present since 13.5.0) ## Behavior (matches legacy) - The generated C++ source files an object references are read from disk via a provider; IL2CPP `//<source_info:File.cs:line>` markers are parsed from them. Objects that yield an empty mapping are skipped. - The mapping DIF is named `<name>.il2cpp` and carries the source file's debug id. - When combined with `--include-sources`, the per-file source bundle **also** collects the referenced C# source files (`collectIl2cppSources`) — otherwise the source bundle is unchanged. - Missing/unreadable sources and mapping failures are swallowed (debug-logged), never aborting the upload. - `--symbol-maps` (BCSymbolMap resolution) remains the only deferred legacy upload option. ## Implementation - `src/lib/dif/index.ts`: `createIl2cppLineMapping(data, readSource)` + a `collectIl2cppSources` option on `createSourceBundle()`. - `src/commands/debug-files/upload.ts`: `--il2cpp-mapping` flag; extracted a shared `readSourceFile` disk-reader used by both the source bundle and the IL2CPP provider. ## Tests - `test/lib/dif/il2cpp.test.ts` — `createIl2cppLineMapping` (mapping JSON, provider-null → null, no-markers → null) and `createSourceBundle` `collectIl2cppSources` (C# included only when enabled, verified by unzipping the bundle). - `test/commands/debug-files/upload.test.ts` — `--il2cpp-mapping` queues the `.il2cpp` DIF, absent without the flag, and is threaded through to `uploadDebugFiles`. All text fixtures (Breakpad `FILE` records + synthetic C++/C#), no binaries. Verified locally: `typecheck`, `lint`, `test` (358 debug-files/dif/scan tests), `check:deps`, `check:fragments` all green.
…M handles (#1166) Post-merge follow-ups for the `debug-files` DIF work (#1163, #1165). ## 1. `fix`: map the targeted object for `--il2cpp-mapping` Addresses a Cursor Bugbot finding on #1165 ("IL2CPP mapping wrong object", Medium). For a fat archive, `createIl2cppLineMapping` mapped the first debug-info object (`selectBundledObject` over **all** objects), while the uploaded `.il2cpp` DIF was stamped with `file.debugId` (the **filter-matched primary**). With `--id` targeting a non-primary slice, the DIF could advertise one debug id while containing another object's line mappings. - `createIl2cppLineMapping` now takes a `targetDebugId` and maps that specific object (falling back to `selectBundledObject` when omitted/not found). - `appendIl2cppMapping` passes `file.debugId` and stamps the DIF with the **mapped object's own** id — so the id always matches the contents. Tests: added cases asserting the returned id tracks the targeted object and the fallback path. ## 2. `perf`: free WASM archive handles after use Addresses review finding M-1. The DIF helpers (`parseDebugFile`, `extractEmbeddedPpdb`, `createSourceBundle`, `createIl2cppLineMapping`, `listSources`) created `Archive`/`PeFile` WASM handles that were never released, so a scan accumulated them (and their backing byte views) in WASM linear memory until process exit. - Declared each with `using` so the handle is freed on return. - Each helper returns only plain values or copied byte buffers, so releasing the archive is safe. Verified empirically that disposing the archive independently of the objects derived from it does **not** double-free (symbolic's self-cell objects own their data), including after forcing GC. ## Verification `typecheck`, `lint`, `check:deps`, `check:fragments`, and the full debug-files/dif/scan suite (360 tests) all pass. No behavior change for the common single-object case.


What
sentry debug-files uploadnow automatically extracts a Portable PDB embedded inside a managed .NET PE assembly (debug directory entry type 17, deflate-compressed) and uploads it as a separate<name>.pdbdebug file — faithful to legacysentry-cli'sextract_embedded_ppdb.Consumes the new WASM API from
@sentry/symbolic@13.7.0(bumped from 13.5.0):ObjectFile.asPe()→PeFile.embeddedPpdb()(shipped in getsentry/symbolic#1004).Why the extraction lives at the prepare stage
A managed PE's debug info lives entirely in its embedded Portable PDB, so the PE object itself typically reports no debug features and is dropped by the feature filter. Extraction therefore happens in
prepareDifs(not inbuildDifListover already-preparedfiles), independent of whether the PE passed the filters — mirroring legacy, which extracts from any PE with a debug id.The standalone extracted PPDB bytes are then run through the same per-object filters as any other file, so it:
--type portablepdb(a PE excluded by thepe-format filter is still read whenportablepdbis wanted),--idfilters,.ziparchives.Extraction/decompression errors are swallowed (logged at debug level) so a malformed embed never aborts the scan.
Tests
test/lib/dif/embedded-ppdb.test.ts— unit tests forextractEmbeddedPpdb(positive/negative, standalone-PPDB round-trip) against small committed .NET PE fixtures.test/commands/debug-files/upload.test.ts— command-level: dry-run queues the.pdb,--type portablepdbextracts it,--type elfignores it, PE-without-embed yields nothing, and the PPDB is threaded through touploadDebugFiles.Fixtures (
embedded-ppdb.dll,pe-no-ppdb.dll) are the small Sentry sample console binaries fromsymbolic-testutils.Verified locally:
typecheck,lint,test(350 debug-files/dif/scan tests),check:deps,check:fragmentsall green.Notes
--il2cpp-mapping) to follow.