Skip to content

feat(debug-files): extract embedded Portable PDBs from managed PEs#1163

Merged
BYK merged 3 commits into
mainfrom
byk/feat/debug-files-embedded-ppdb
Jul 2, 2026
Merged

feat(debug-files): extract embedded Portable PDBs from managed PEs#1163
BYK merged 3 commits into
mainfrom
byk/feat/debug-files-embedded-ppdb

Conversation

@BYK

@BYK BYK commented Jul 1, 2026

Copy link
Copy Markdown
Member

What

sentry debug-files upload now automatically extracts a Portable PDB embedded inside a managed .NET PE assembly (debug directory entry type 17, deflate-compressed) and uploads it as a separate <name>.pdb debug file — faithful to legacy sentry-cli's extract_embedded_ppdb.

Consumes the new WASM API from @sentry/symbolic@13.7.0 (bumped from 13.5.0):
ObjectFile.asPe()PeFile.embeddedPpdb() (shipped in getsentry/symbolic#1004).

Why the extraction lives at the prepare stage

A managed PE's debug info lives entirely in its embedded Portable PDB, so the PE object itself typically reports no debug features and is dropped by the feature filter. Extraction therefore happens in prepareDifs (not in buildDifList over already-prepared files), independent of whether the PE passed the filters — mirroring legacy, which extracts from any PE with a debug id.

The standalone extracted PPDB bytes are then run through the same per-object filters as any other file, so it:

  • honors --type portablepdb (a PE excluded by the pe-format filter is still read when portablepdb is wanted),
  • honors feature and --id filters,
  • is size-gated against the server's advertised max file size,
  • dedupes normally (debug id + content hash),
  • and works for PEs found inside scanned .zip archives.

Extraction/decompression errors are swallowed (logged at debug level) so a malformed embed never aborts the scan.

Tests

  • test/lib/dif/embedded-ppdb.test.ts — unit tests for extractEmbeddedPpdb (positive/negative, standalone-PPDB round-trip) against small committed .NET PE fixtures.
  • test/commands/debug-files/upload.test.ts — command-level: dry-run queues the .pdb, --type portablepdb extracts it, --type elf ignores it, PE-without-embed yields nothing, and the PPDB is threaded through to uploadDebugFiles.

Fixtures (embedded-ppdb.dll, pe-no-ppdb.dll) are the small Sentry sample console binaries from symbolic-testutils.

Verified locally: typecheck, lint, test (350 debug-files/dif/scan tests), check:deps, check:fragments all green.

Notes

Managed .NET PE assemblies can carry their Portable PDB debug companion
inline (debug directory entry type 17, deflate-compressed). `debug-files
upload` now extracts that PPDB during scanning and uploads it as a
separate `<name>.pdb` DIF, faithful to legacy sentry-cli's
`extract_embedded_ppdb`.

A managed PE's debug info lives entirely in the embedded PPDB, so the PE
object itself usually has no debug features and is dropped by the feature
filter. Extraction therefore happens at the prepare stage, independent of
the PE's own filtering; the standalone PPDB bytes then flow through the
normal per-object filters, so it honors `--type portablepdb`, feature, and
`--id` filters and is size-gated against the server's max file size.

Consumes `ObjectFile.asPe()` / `PeFile.embeddedPpdb()` from
@sentry/symbolic 13.7.0.
@github-actions github-actions Bot added the risk: high PR risk score: high label Jul 1, 2026
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor
PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://cli.sentry.dev/_preview/pr-1163/

Built to branch gh-pages at 2026-07-02 09:30 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Codecov Results 📊

✅ Patch coverage is 85.00%. Project has 5142 uncovered lines.
✅ Project coverage is 81.49%. Comparing base (base) to head (head).

Files with missing lines (2)
File Patch % Lines
src/lib/dif/scan.ts 83.33% ⚠️ 8 Missing and 3 partials
src/lib/dif/index.ts 91.67% ⚠️ 1 Missing
Coverage diff
@@            Coverage Diff             @@
##          main       #PR       +/-##
==========================================
+ Coverage    81.49%    81.49%        —%
==========================================
  Files          399       399         —
  Lines        27744     27787       +43
  Branches     18025     18049       +24
==========================================
+ Hits         22609     22645       +36
- Misses        5135      5142        +7
- Partials      1859      1861        +2

Generated by Codecov Action

…sized PPDBs

- Cheap-peek the buffer header and only attempt embedded-PPDB extraction for
  `pe` images, so non-PE files (ELF/Mach-O/Breakpad/...) no longer pay a full
  second parse on every scanned file.
- When an extracted Portable PDB of a requested type exceeds the max file size,
  surface it through the oversized count so the command exits non-zero with a
  size message instead of a misleading "no debug files found" (exit 0).

@cursor cursor Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 07eed3d. Configure here.

Comment thread src/lib/dif/scan.ts
`Skipping ${path} for embedded PPDB extraction: size ${peeked.size} exceeds maximum file size ${maxFileSize}`
);
return { dif: null, oversized: true };
return { difs: [], oversized: false };

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oversized PE skips embedded PPDB

Medium Severity

In prepareFileDif, when a PE’s on-disk size exceeds maxFileSize, the scan returns before reading the file, so embeddedPpdbDif never runs. The PR treats the decompressed Portable PDB as the upload payload and size-gates that blob separately; a large assembly with a small embedded PDB is dropped even though the extracted .pdb would be within the limit.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 07eed3d. Configure here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional tradeoff, declining. The scanner deliberately never buffers a file that exceeds maxFileSize (see the size gate here and in the ZIP path) — this reader loads whole files into memory rather than memory-mapping them like the legacy Rust CLI, so that invariant is what bounds peak memory across a scan.

Reading an over-cap PE purely to extract a smaller embedded PPDB would break that invariant for every over-cap PE encountered (including large native PEs that carry no PPDB), which is exactly what the gate exists to prevent. In practice maxFileSize is the server-advertised cap (typically ≥1–2 GiB), and a managed .NET assembly that large does not occur, so the embedded PPDB of an over-cap PE is a case we accept skipping. The extracted PPDB is still independently size-gated on the normal (in-cap PE) path.

If a low-cap self-hosted configuration ever makes this matter, the right follow-up is a separate, bounded read budget for PPDB-only extraction — tracked as a follow-up rather than bundled here.

@BYK BYK merged commit 2b3699d into main Jul 2, 2026
32 checks passed
@BYK BYK deleted the byk/feat/debug-files-embedded-ppdb branch July 2, 2026 10:34
BYK added a commit that referenced this pull request Jul 2, 2026
#1165)

> Follows #1163 (embedded Portable PDB extraction), now merged. This PR
is rebased onto `main`. (Re-opened from #1164, which GitHub auto-closed
when its base branch was deleted on #1163's merge.)

## What

Adds `sentry debug-files upload --il2cpp-mapping`: for each prepared
debug file it computes a **Unity IL2CPP C++→C# line mapping** (from the
file's primary debug-info object) and uploads it as a separate `il2cpp`
DIF carrying the object's debug id — faithful to legacy `sentry-cli`'s
`create_il2cpp_mappings`.

Consumes the new WASM API from **`@sentry/symbolic@13.7.0`**
(getsentry/symbolic#1005):
- top-level `il2cppLineMapping(object, provider)`
- `SourceBundleWriter.collectIl2cppSources` (present since 13.5.0)

## Behavior (matches legacy)

- The generated C++ source files an object references are read from disk
via a provider; IL2CPP `//<source_info:File.cs:line>` markers are parsed
from them. Objects that yield an empty mapping are skipped.
- The mapping DIF is named `<name>.il2cpp` and carries the source file's
debug id.
- When combined with `--include-sources`, the per-file source bundle
**also** collects the referenced C# source files
(`collectIl2cppSources`) — otherwise the source bundle is unchanged.
- Missing/unreadable sources and mapping failures are swallowed
(debug-logged), never aborting the upload.
- `--symbol-maps` (BCSymbolMap resolution) remains the only deferred
legacy upload option.

## Implementation

- `src/lib/dif/index.ts`: `createIl2cppLineMapping(data, readSource)` +
a `collectIl2cppSources` option on `createSourceBundle()`.
- `src/commands/debug-files/upload.ts`: `--il2cpp-mapping` flag;
extracted a shared `readSourceFile` disk-reader used by both the source
bundle and the IL2CPP provider.

## Tests

- `test/lib/dif/il2cpp.test.ts` — `createIl2cppLineMapping` (mapping
JSON, provider-null → null, no-markers → null) and `createSourceBundle`
`collectIl2cppSources` (C# included only when enabled, verified by
unzipping the bundle).
- `test/commands/debug-files/upload.test.ts` — `--il2cpp-mapping` queues
the `.il2cpp` DIF, absent without the flag, and is threaded through to
`uploadDebugFiles`.

All text fixtures (Breakpad `FILE` records + synthetic C++/C#), no
binaries.

Verified locally: `typecheck`, `lint`, `test` (358 debug-files/dif/scan
tests), `check:deps`, `check:fragments` all green.
BYK added a commit that referenced this pull request Jul 2, 2026
…M handles (#1166)

Post-merge follow-ups for the `debug-files` DIF work (#1163, #1165).

## 1. `fix`: map the targeted object for `--il2cpp-mapping`

Addresses a Cursor Bugbot finding on #1165 ("IL2CPP mapping wrong
object", Medium).

For a fat archive, `createIl2cppLineMapping` mapped the first debug-info
object (`selectBundledObject` over **all** objects), while the uploaded
`.il2cpp` DIF was stamped with `file.debugId` (the **filter-matched
primary**). With `--id` targeting a non-primary slice, the DIF could
advertise one debug id while containing another object's line mappings.

- `createIl2cppLineMapping` now takes a `targetDebugId` and maps that
specific object (falling back to `selectBundledObject` when omitted/not
found).
- `appendIl2cppMapping` passes `file.debugId` and stamps the DIF with
the **mapped object's own** id — so the id always matches the contents.

Tests: added cases asserting the returned id tracks the targeted object
and the fallback path.

## 2. `perf`: free WASM archive handles after use

Addresses review finding M-1. The DIF helpers (`parseDebugFile`,
`extractEmbeddedPpdb`, `createSourceBundle`, `createIl2cppLineMapping`,
`listSources`) created `Archive`/`PeFile` WASM handles that were never
released, so a scan accumulated them (and their backing byte views) in
WASM linear memory until process exit.

- Declared each with `using` so the handle is freed on return.
- Each helper returns only plain values or copied byte buffers, so
releasing the archive is safe. Verified empirically that disposing the
archive independently of the objects derived from it does **not**
double-free (symbolic's self-cell objects own their data), including
after forcing GC.

## Verification
`typecheck`, `lint`, `check:deps`, `check:fragments`, and the full
debug-files/dif/scan suite (360 tests) all pass. No behavior change for
the common single-object case.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

risk: high PR risk score: high

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant