Skip to content

fix(native): macOS thread stack descriptor#1726

Open
mujacica wants to merge 8 commits into
masterfrom
fix/native-macos-thread-stack-descriptor
Open

fix(native): macOS thread stack descriptor#1726
mujacica wants to merge 8 commits into
masterfrom
fix/native-macos-thread-stack-descriptor

Conversation

@mujacica
Copy link
Copy Markdown
Contributor

  • macOS thread stack descriptor
  • Tests for indirect heap memory

mujacica added 3 commits May 15, 2026 14:47
…ed SP

The macOS native backend's full path (when ``task_for_pid`` succeeds)
used to call ``thread_get_state()`` from inside the daemon to pick up
each thread's SP, then wrote that SP into the per-thread
``MINIDUMP_THREAD::stack.start_of_memory_range`` descriptor. By the
time the daemon runs, the crashing thread's signal handler has long
since moved its SP deep into libsystem code (waiting on the daemon
IPC), so every minidump emitted on macOS contained a thread descriptor
that pointed at libsystem code pages instead of the real thread stack
region. The thread's CONTEXT block (filled from the signal-handler-
captured ucontext) still had the right crash-time SP, so MemoryListStream
and the per-thread descriptor disagreed about where the stack lives —
breaking any reader that walks the stack from SP via the descriptor.

Match each Mach thread back to the signal-handler snapshot in
``crash_ctx->platform.threads[]`` by ``thread_id`` and use that saved
state for register + stack capture; ``thread_get_state`` only fills the
gap when the snapshot is missing the thread for some reason. Also clamp
``write_thread_stack`` to the VM region containing SP — without it, a
crashing thread whose SP sits near the top of a small worker stack
caused the fixed-size ``read_task_memory`` call to overshoot into the
guard page and the entire read failed, leaving the descriptor empty.

Linux already matched by ``tid`` so it was unaffected; Windows
delegates to ``MiniDumpWriteDump`` which always populates the
descriptor correctly.

Tests:

  * extend ``test_native_minidump_streams`` (runs on every platform)
    with a cross-stream assertion: every thread's
    ``MINIDUMP_THREAD::stack`` descriptor must contain the SP recorded
    in that thread's CONTEXT. Regression guard against this class of
    descriptor/context disagreement on any backend.

  * add ``test_native_smart_mode_captures_indirect_heap_memory``
    (macOS + Linux). The published behaviour of
    ``SENTRY_MINIDUMP_MODE_SMART`` is "stack-only dump plus ~1 KiB
    around every writable-heap pointer reachable from registers + stack
    words". The minimal-mode fallback on macOS (when ``task_for_pid``
    fails) silently breaks that contract by emitting *only* module
    header pages, with no heap region anywhere in MemoryListStream.
    The test ad-hoc codesigns the example + ``sentry-crash`` daemon
    with ``com.apple.security.cs.debugger`` + ``get-task-allow`` +
    hardened-runtime so ``task_for_pid`` succeeds on macOS (no real
    Apple Developer cert needed), then asserts at least one captured
    memory region lands outside both the loaded-module image ranges
    and the per-thread stack ranges. On Linux ``/proc/self/maps`` is
    readable in-process so no codesigning is needed. Test skipped on
    Windows since ``MiniDumpWriteDump`` uses different controls
    (``MINIDUMP_TYPE`` flags).

Verified on:

  * macOS 26 arm64: ``pytest tests/test_integration_native.py``
    -> 23 passed, 2 skipped (Windows-only WER tests).
  * Linux Ubuntu 24.04 arm64 (Docker): ``pytest tests/test_integration_native.py``
    -> 22 passed, 3 skipped (Windows + OOM).
Windows ``sentry__write_minidump`` already wires
``SENTRY_MINIDUMP_MODE_SMART`` to
``MiniDumpWithIndirectlyReferencedMemory | MiniDumpWithDataSegs``, so
the OS does the equivalent stack-word scan and the same "captured
region outside modules + thread stacks" invariant should hold.
Dropping the Windows skip lets CI catch any regression there.

Test logic is platform-agnostic: ``_parse_minidump`` decodes SP from
the correct ``CONTEXT`` offset per ``ProcessorArchitecture``
(AMD64 ``rsp`` at 0x98, x86 ``esp`` at 0xC4, ARM64 ``sp`` at 0x100 in
both Breakpad and Microsoft layouts). ``_codesign_for_task_for_pid``
is darwin-only-guarded so Windows + Linux runs do nothing extra.
Comment thread tests/test_integration_native.py Outdated
mujacica added 2 commits May 15, 2026 14:59
``MINIDUMP_EXCEPTION.exception_code`` on macOS must carry the **Mach
exception type** (EXC_BAD_ACCESS = 1, EXC_BREAKPOINT = 6, ...) per the
Breakpad / Crashpad convention every minidump reader respects. Earlier
the writer put the BSD signal number directly into that field (#1576,
commit 9cad572), which works as long as the consumer doesn't try to
decode the field — but minidump-rs, the Breakpad processor, and
Crashpad's own minidump tools all read it as a Mach exception type
and produced wrong crash-reason labels (e.g. ``MacGeneral(EXC_RESOURCE,
0)`` for a plain SIGSEGV, since 11 = SIGSEGV in BSD signals AND
EXC_RESOURCE in the Mach exception enum).

This commit reverses the macOS kernel's signal-translation: given a
(signum, siginfo) pair we infer the Mach exception type + subcode that
produced it, write those into ``exception_code`` / ``exception_flags``
exactly like Breakpad's ``MinidumpGenerator::WriteExceptionStream``
does, and keep the BSD signal available to consumers via
``exception_information[0]`` so the previous "lldb wants the signal"
use case still works.

The translation table mirrors Apple's BSD ↔ Mach mapping:

  SIGSEGV → EXC_BAD_ACCESS (KERN_INVALID_ADDRESS / PROTECTION_FAILURE
            from si_code)
  SIGBUS  → EXC_BAD_ACCESS (KERN_INVALID_ADDRESS / PROTECTION_FAILURE)
  SIGILL  → EXC_BAD_INSTRUCTION
  SIGFPE  → EXC_ARITHMETIC
  SIGTRAP → EXC_BREAKPOINT
  SIGABRT → EXC_CRASH (with signal encoded in code[0] low bits)
  SIGSYS  → EXC_SOFTWARE / EXC_SOFT_SIGNAL
  other   → EXC_SOFTWARE / EXC_SOFT_SIGNAL

Verified end-to-end with minidump-rs reading the regenerated UAF dump:
``crash_reason_debug`` now decodes as ``MacBadAccessKern(...)`` instead
of ``MacGeneral(EXC_RESOURCE, 0)``.

Tests:

  * ``test_native_minidump_streams`` extended to assert
    ``exception_code`` is one of the valid Mach exception enum values
    and ``exception_information[0]`` carries a plausible BSD signal
    number. Catches regressions to the BSD-signum-as-exception-code
    convention used by the prior commit.

Verified on:

  * macOS 26 arm64 (local): pytest tests/test_integration_native.py
    -> 23 passed, 2 skipped (Windows-only WER).
  * Linux Ubuntu 24.04 arm64 (Docker): both new tests pass.
Comment thread tests/test_integration_native.py Outdated
Comment thread tests/test_integration_native.py Outdated
Replaces magic numbers and hand-rolled bit shifts in
``bsd_signal_to_mach_exception`` with the corresponding constants from
``<mach/exception_types.h>`` / ``<mach/kern_return.h>`` / ``<signal.h>``.
Reads as a plain translation table — ``case SIGSEGV: out_type =
EXC_BAD_ACCESS; out_code = (si_code == SEGV_ACCERR ?
KERN_PROTECTION_FAILURE : KERN_INVALID_ADDRESS)`` — instead of opaque
integer literals.

Also corrects the SIGABRT / SIGSYS / SIGPIPE encodings to match what
xnu's ``ux_exception.c`` actually emits when the kernel translates a
Mach exception to a BSD signal:

  * SIGABRT  -> EXC_SOFTWARE, code = EXC_UNIX_ABORT       (0x10002)
  * SIGSYS   -> EXC_SOFTWARE, code = EXC_UNIX_BAD_SYSCALL (0x10000)
  * SIGPIPE  -> EXC_SOFTWARE, code = EXC_UNIX_BAD_PIPE    (0x10001)

The previous encoding emitted ``EXC_CRASH (10)`` with the signum in
bits 24..31 of code[0] for SIGABRT, which no minidump reader's
``CrashReason::from_exception`` matched — they all fell through to
``Unknown(10, ...)``. With the corrected EXC_SOFTWARE/EXC_UNIX_*
encoding, minidump-rs now decodes SIGABRT-derived dumps as
``MacSoftware(SIGABRT)`` directly. The EXC_UNIX_* constants live in
xnu's ``bsd/sys/ux_exception.h`` (kernel-only) so we define them
locally with a link to the canonical source.

The three EXC_UNIX_* constants have been stable across every macOS
release since the BSD/Mach merge, and the public mach exception type
+ kern_return enums are part of the SDK so there are no
``#if MAC_OS_X_VERSION_*`` gates needed.

Tests:

  * fix offset bug in ``test_native_minidump_streams`` flagged by
    review bot: the test read ``number_parameters`` from
    ``rva + 16`` (which lands on the always-zero ``exception_record``
    pointer field), so the subsequent ``if number_parameters >= 1``
    silently skipped the BSD-signal-in-information[0] assertion.
    Corrected offsets per the full MINIDUMP_EXCEPTION_STREAM layout
    (``number_parameters`` at +32, ``exception_information[0]`` at
    +40) and tightened the conditional into a hard assert so the
    check actually runs. Also added SIGSYS (12) to the set of valid
    BSD signals.

Verified end-to-end with minidump-rs reading every regenerated dump:

  precondition-panic   -> MacBadAccessKern(KERN_PROTECTION_FAILURE)
  use-after-free       -> MacBadAccessKern(KERN_PROTECTION_FAILURE)
  double-free          -> MacSoftware(SIGABRT)
  uninit-read          -> MacBadAccessKern(KERN_PROTECTION_FAILURE)
  deadlock             -> MacSoftware(SIGABRT)
  refcount-mismatch    -> MacBadAccessKern(KERN_PROTECTION_FAILURE)
  stack-overflow       -> MacBadAccessKern(KERN_PROTECTION_FAILURE)
  cpp-exception        -> MacSoftware(SIGABRT)

Test suite: 23 passed, 2 skipped (Windows-only WER) on macOS 26 arm64.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit a16e4c4. Configure here.

Comment thread src/backends/native/minidump/sentry_minidump_macos.c Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant