fix(android): mask signal during CHAIN_AT_START to survive chained Mono handler re-raises#1572
fix(android): mask signal during CHAIN_AT_START to survive chained Mono handler re-raises#1572
Conversation
89200fa to
ab26763
Compare
Yeah, this makes sense, but is there no Mono test in the downstream integration tests? It is quite painful that the two runtimes differ so severely, given that we seem to lack any early warning for that particular config. Or did this only happen with .NET 10?
Not critical is an understatement. It is as critical as with all other use cases, when the signal actually comes from code that the Native SDK should handle. Wouldn't it be better to just mask the incoming signal before we invoke the handler at start? This way, a |
This happens with both .NET 9 and .NET 10. Unfortunately, the PR that takes chained signal handling into use and would have revealed this problem, has been on hold until now due to other issues in .NET 10. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
supervacuus
left a comment
There was a problem hiding this comment.
This looks like a significant improvement RE: early warning for behavior in .NET on Android. Thanks!
It is also much better to do this rather than disable SA_NODEFER wholesale for chain-at-start, and the only critical code section is sufficiently highlighted, isolated, and short. I hope we can keep it that way.
Maybe align with #1579 before you merge it (but essentially, whichever way makes the most sense to you).
SA_NODEFER (added in #1446) is incompatible with the CHAIN_AT_START signal handler strategy. When chaining to the runtime's signal handler (e.g. Mono), the runtime may reset the signal to SIG_DFL and re-raise. With SA_NODEFER the re-raised signal is delivered immediately, killing the process before our handler can regain control. Without SA_NODEFER, the re-raised signal is blocked during handler execution, allowing the runtime handler to return and sentry-native to proceed with crash capture. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This reverts commit 91afd1a.
…ng process With SA_NODEFER, the chained handler's re-raise is delivered immediately and kills the process before we regain control. Mask the signal via raw rt_sigprocmask (to bypass Android's libsigchain), then after the chain: reinstall our handler if it was reset to SIG_DFL, consume any pending signal with sigtimedwait, and unmask. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
sigtimedwait is not declared without _POSIX_C_SOURCE >= 199309L, so use the raw SYS_rt_sigtimedwait syscall instead. Also replace sizeof(sigset_t) with _NSIG/8 since the kernel expects 8 bytes, not glibc's 128. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
libsigchain's sigprocmask guard is only active inside its own special handlers, so our signal handler still gets filtered. Keep the raw syscall on Android and use the standard sigprocmask on other platforms. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…raises Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* test: add Android emulator test for dotnet signal handling Extends the existing dotnet_signal fixture to build as an Android APK and run on an emulator. The test verifies CHAIN_AT_START signal handling with Mono: - Handled managed exception (NRE): no native crash registered - Unhandled managed exception: Mono aborts, native crash registered - Native crash (SIGSEGV in libcrash.so): crash envelope produced The fixture csproj now multi-targets net10.0 and net10.0-android. Program.cs exposes RunTest() for the Android MainActivity entry point. Database state is checked directly on-device via adb run-as. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: wrap skipif conditions in bool() to avoid pytest string evaluation CI sets TEST_X86/ANDROID_API/RUN_ANALYZER to empty strings. Python's `or` chain returns the last falsy value (empty string ""), which pytest then tries to compile() as a legacy condition expression, causing SyntaxError. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * ci: install .NET SDK and Android workload for Android CI jobs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: only target net10.0-android when ANDROID_HOME is set Desktop CI runners have .NET 10 but no android workload. MSBuild validates all TFMs even when building for a specific one, causing NETSDK1147. Conditionally add the android TFM only when ANDROID_HOME is present. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use ANDROID_API instead of ANDROID_HOME for android TFM condition Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: print logcat output for Android test diagnostics Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use sh -c for run-as commands on Android run-as can't find the `test` binary on older API levels. Use sh -c with a single quoted command string so shell builtins are available. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: include stdout/stderr in dotnet run returncode assertions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use am start -W to avoid race in Android test am start is asynchronous — it returns before the process starts. The pidof poll could find no process yet and break immediately, causing assertions to run before sentry_init, and the finally block's adb uninstall to kill the app mid-startup (deletePackageX). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: exclude Android sources from desktop dotnet build Microsoft.NET.Sdk (plain) compiles all .cs files regardless of directory conventions. Exclude Platforms/Android/ when not building for an Android target framework. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * try arm64-v8a * try x86_64 on linux * Test 22-30 * fix: move test logic to OnResume to avoid am start -W hang OnCreate fires before the activity reports launch completion, so if the test crashes the app, am start -W blocks indefinitely on older API levels. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * ci: use default emulator target instead of google_apis Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: replace sleeps with retry loops in Android test Use wait_for() with polling instead of fixed sleeps to handle timing variations across emulator API levels. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add timeout to am start -W with logcat dump on failure Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: run test directly on UI thread instead of background thread Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use DecorView.Post + worker thread for Android test Post ensures OnResume returns before the test runs (so am start -W completes). Worker thread avoids Android's main thread uncaught exception handler killing the process with SIGKILL. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: simplify Android test app and handle am start -W timeout Move test back to OnResume with worker thread (no DecorView.Post). Silently handle am start -W timeout since older API levels may not report activity launch completion before the app exits. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: emulate MAUI abort behavior for unhandled exceptions on Android Run the test on the main thread via Handler.Post and catch unhandled managed exceptions with try-catch + abort(), matching how MAUI handles them. This ensures sentry-native's signal handler captures the crash across all Android API levels. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: expect no crash for unhandled managed exceptions on Android Unhandled managed exceptions on Android go through Mono's exit(1), not a catchable signal. sentry-dotnet handles these at the managed layer via UnhandledExceptionRaiser, so sentry-native should not register a crash. Remove the abort() workaround and align expectations with the desktop test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * ci: drop broken Android API 22 job Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * ci: switch Android emulator to google_apis, drop API 27 Switch emulator target from default to google_apis to fix stuck emulator boots on API 23-25. Drop API 27 which has no google_apis x86_64 system image. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: skip Android dotnet signal test on API < 26 Pre-tombstoned Android (API < 26) uses debuggerd which kills the process before sentry-native's signal handler can run when using CHAIN_AT_START strategy. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Try macos-15-large again, drop others but 26 * test: clean up dotnet signal test assertions and comments Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * ci: switch Android emulator back to default target Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use double quotes in run-as shell wrapper to preserve inner quotes The find command uses single-quoted '*.envelope' glob pattern which broke when wrapped in single-quoted sh -c. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This reverts commit b20d459.
5486c9f to
eb5c4d2
Compare
Summary
SA_NODEFERdoesn't let re-raises kill the processrt_sigprocmasksyscall to bypass Android'slibsigchain, whosesigprocmaskguard is only active inside its own special handlersSIG_DFL, consume any pending signal viasigtimedwait, and unmaskContext
With
SA_NODEFER, the chained Mono handler can reset the signal handler toSIG_DFLand re-raise. The re-raised signal is delivered immediately and kills the process before inproc can capture the crash.This is gated to Android because only the Mono runtime (used on Android) does the reset+raise pattern. On desktop Linux, CoreCLR modifies the ucontext and returns without re-raising.