fix(native): macOS thread stack descriptor#1726
Open
mujacica wants to merge 8 commits into
Open
Conversation
Contributor
mujacica
commented
May 15, 2026
- macOS thread stack descriptor
- Tests for indirect heap memory
…ed SP
The macOS native backend's full path (when ``task_for_pid`` succeeds)
used to call ``thread_get_state()`` from inside the daemon to pick up
each thread's SP, then wrote that SP into the per-thread
``MINIDUMP_THREAD::stack.start_of_memory_range`` descriptor. By the
time the daemon runs, the crashing thread's signal handler has long
since moved its SP deep into libsystem code (waiting on the daemon
IPC), so every minidump emitted on macOS contained a thread descriptor
that pointed at libsystem code pages instead of the real thread stack
region. The thread's CONTEXT block (filled from the signal-handler-
captured ucontext) still had the right crash-time SP, so MemoryListStream
and the per-thread descriptor disagreed about where the stack lives —
breaking any reader that walks the stack from SP via the descriptor.
Match each Mach thread back to the signal-handler snapshot in
``crash_ctx->platform.threads[]`` by ``thread_id`` and use that saved
state for register + stack capture; ``thread_get_state`` only fills the
gap when the snapshot is missing the thread for some reason. Also clamp
``write_thread_stack`` to the VM region containing SP — without it, a
crashing thread whose SP sits near the top of a small worker stack
caused the fixed-size ``read_task_memory`` call to overshoot into the
guard page and the entire read failed, leaving the descriptor empty.
Linux already matched by ``tid`` so it was unaffected; Windows
delegates to ``MiniDumpWriteDump`` which always populates the
descriptor correctly.
Tests:
* extend ``test_native_minidump_streams`` (runs on every platform)
with a cross-stream assertion: every thread's
``MINIDUMP_THREAD::stack`` descriptor must contain the SP recorded
in that thread's CONTEXT. Regression guard against this class of
descriptor/context disagreement on any backend.
* add ``test_native_smart_mode_captures_indirect_heap_memory``
(macOS + Linux). The published behaviour of
``SENTRY_MINIDUMP_MODE_SMART`` is "stack-only dump plus ~1 KiB
around every writable-heap pointer reachable from registers + stack
words". The minimal-mode fallback on macOS (when ``task_for_pid``
fails) silently breaks that contract by emitting *only* module
header pages, with no heap region anywhere in MemoryListStream.
The test ad-hoc codesigns the example + ``sentry-crash`` daemon
with ``com.apple.security.cs.debugger`` + ``get-task-allow`` +
hardened-runtime so ``task_for_pid`` succeeds on macOS (no real
Apple Developer cert needed), then asserts at least one captured
memory region lands outside both the loaded-module image ranges
and the per-thread stack ranges. On Linux ``/proc/self/maps`` is
readable in-process so no codesigning is needed. Test skipped on
Windows since ``MiniDumpWriteDump`` uses different controls
(``MINIDUMP_TYPE`` flags).
Verified on:
* macOS 26 arm64: ``pytest tests/test_integration_native.py``
-> 23 passed, 2 skipped (Windows-only WER tests).
* Linux Ubuntu 24.04 arm64 (Docker): ``pytest tests/test_integration_native.py``
-> 22 passed, 3 skipped (Windows + OOM).
Windows ``sentry__write_minidump`` already wires ``SENTRY_MINIDUMP_MODE_SMART`` to ``MiniDumpWithIndirectlyReferencedMemory | MiniDumpWithDataSegs``, so the OS does the equivalent stack-word scan and the same "captured region outside modules + thread stacks" invariant should hold. Dropping the Windows skip lets CI catch any regression there. Test logic is platform-agnostic: ``_parse_minidump`` decodes SP from the correct ``CONTEXT`` offset per ``ProcessorArchitecture`` (AMD64 ``rsp`` at 0x98, x86 ``esp`` at 0xC4, ARM64 ``sp`` at 0x100 in both Breakpad and Microsoft layouts). ``_codesign_for_task_for_pid`` is darwin-only-guarded so Windows + Linux runs do nothing extra.
``MINIDUMP_EXCEPTION.exception_code`` on macOS must carry the **Mach exception type** (EXC_BAD_ACCESS = 1, EXC_BREAKPOINT = 6, ...) per the Breakpad / Crashpad convention every minidump reader respects. Earlier the writer put the BSD signal number directly into that field (#1576, commit 9cad572), which works as long as the consumer doesn't try to decode the field — but minidump-rs, the Breakpad processor, and Crashpad's own minidump tools all read it as a Mach exception type and produced wrong crash-reason labels (e.g. ``MacGeneral(EXC_RESOURCE, 0)`` for a plain SIGSEGV, since 11 = SIGSEGV in BSD signals AND EXC_RESOURCE in the Mach exception enum). This commit reverses the macOS kernel's signal-translation: given a (signum, siginfo) pair we infer the Mach exception type + subcode that produced it, write those into ``exception_code`` / ``exception_flags`` exactly like Breakpad's ``MinidumpGenerator::WriteExceptionStream`` does, and keep the BSD signal available to consumers via ``exception_information[0]`` so the previous "lldb wants the signal" use case still works. The translation table mirrors Apple's BSD ↔ Mach mapping: SIGSEGV → EXC_BAD_ACCESS (KERN_INVALID_ADDRESS / PROTECTION_FAILURE from si_code) SIGBUS → EXC_BAD_ACCESS (KERN_INVALID_ADDRESS / PROTECTION_FAILURE) SIGILL → EXC_BAD_INSTRUCTION SIGFPE → EXC_ARITHMETIC SIGTRAP → EXC_BREAKPOINT SIGABRT → EXC_CRASH (with signal encoded in code[0] low bits) SIGSYS → EXC_SOFTWARE / EXC_SOFT_SIGNAL other → EXC_SOFTWARE / EXC_SOFT_SIGNAL Verified end-to-end with minidump-rs reading the regenerated UAF dump: ``crash_reason_debug`` now decodes as ``MacBadAccessKern(...)`` instead of ``MacGeneral(EXC_RESOURCE, 0)``. Tests: * ``test_native_minidump_streams`` extended to assert ``exception_code`` is one of the valid Mach exception enum values and ``exception_information[0]`` carries a plausible BSD signal number. Catches regressions to the BSD-signum-as-exception-code convention used by the prior commit. Verified on: * macOS 26 arm64 (local): pytest tests/test_integration_native.py -> 23 passed, 2 skipped (Windows-only WER). * Linux Ubuntu 24.04 arm64 (Docker): both new tests pass.
Replaces magic numbers and hand-rolled bit shifts in
``bsd_signal_to_mach_exception`` with the corresponding constants from
``<mach/exception_types.h>`` / ``<mach/kern_return.h>`` / ``<signal.h>``.
Reads as a plain translation table — ``case SIGSEGV: out_type =
EXC_BAD_ACCESS; out_code = (si_code == SEGV_ACCERR ?
KERN_PROTECTION_FAILURE : KERN_INVALID_ADDRESS)`` — instead of opaque
integer literals.
Also corrects the SIGABRT / SIGSYS / SIGPIPE encodings to match what
xnu's ``ux_exception.c`` actually emits when the kernel translates a
Mach exception to a BSD signal:
* SIGABRT -> EXC_SOFTWARE, code = EXC_UNIX_ABORT (0x10002)
* SIGSYS -> EXC_SOFTWARE, code = EXC_UNIX_BAD_SYSCALL (0x10000)
* SIGPIPE -> EXC_SOFTWARE, code = EXC_UNIX_BAD_PIPE (0x10001)
The previous encoding emitted ``EXC_CRASH (10)`` with the signum in
bits 24..31 of code[0] for SIGABRT, which no minidump reader's
``CrashReason::from_exception`` matched — they all fell through to
``Unknown(10, ...)``. With the corrected EXC_SOFTWARE/EXC_UNIX_*
encoding, minidump-rs now decodes SIGABRT-derived dumps as
``MacSoftware(SIGABRT)`` directly. The EXC_UNIX_* constants live in
xnu's ``bsd/sys/ux_exception.h`` (kernel-only) so we define them
locally with a link to the canonical source.
The three EXC_UNIX_* constants have been stable across every macOS
release since the BSD/Mach merge, and the public mach exception type
+ kern_return enums are part of the SDK so there are no
``#if MAC_OS_X_VERSION_*`` gates needed.
Tests:
* fix offset bug in ``test_native_minidump_streams`` flagged by
review bot: the test read ``number_parameters`` from
``rva + 16`` (which lands on the always-zero ``exception_record``
pointer field), so the subsequent ``if number_parameters >= 1``
silently skipped the BSD-signal-in-information[0] assertion.
Corrected offsets per the full MINIDUMP_EXCEPTION_STREAM layout
(``number_parameters`` at +32, ``exception_information[0]`` at
+40) and tightened the conditional into a hard assert so the
check actually runs. Also added SIGSYS (12) to the set of valid
BSD signals.
Verified end-to-end with minidump-rs reading every regenerated dump:
precondition-panic -> MacBadAccessKern(KERN_PROTECTION_FAILURE)
use-after-free -> MacBadAccessKern(KERN_PROTECTION_FAILURE)
double-free -> MacSoftware(SIGABRT)
uninit-read -> MacBadAccessKern(KERN_PROTECTION_FAILURE)
deadlock -> MacSoftware(SIGABRT)
refcount-mismatch -> MacBadAccessKern(KERN_PROTECTION_FAILURE)
stack-overflow -> MacBadAccessKern(KERN_PROTECTION_FAILURE)
cpp-exception -> MacSoftware(SIGABRT)
Test suite: 23 passed, 2 skipped (Windows-only WER) on macOS 26 arm64.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit a16e4c4. Configure here.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
