[libcxxabi] Use InitByteFutex for __cxa_guard in WASM Workers#26283
[libcxxabi] Use InitByteFutex for __cxa_guard in WASM Workers#26283AndreyRepko wants to merge 1 commit intoemscripten-core:mainfrom
Conversation
When using -sWASM_WORKERS, __cxa_guard_acquire uses the GlobalMutex implementation (pthread_mutex_lock + pthread_cond_wait), but libc links pthread stubs where these are all noops. This is not just a performance problem — GlobalMutex does non-atomic read-then-write on the init byte under a noop lock, so two workers can both see UNSET and both become the initializer (double initialization / undefined behavior). Switch to the InitByteFutex implementation which uses atomic CAS for correctness. Wait/wake are no-ops so losers spin in the CAS retry loop rather than sleeping. Cannot use real memory.atomic.wait32 because it traps on the main browser thread and there is no libcxxabi-compatible way to detect the main thread (emscripten_is_main_browser_thread is JS-only). In practice, contention on a single guard is rare and the spin is bounded by the static constructor duration (typically sub-microsecond). Fixes emscripten-core#26277
There was a problem hiding this comment.
Pull request overview
This PR fixes a critical bug in WASM Workers where __cxa_guard_acquire (used for C++ static local variable initialization) can cause double initialization or indefinite busy-spinning when multiple workers trigger the same static initialization simultaneously.
Changes:
- Switch libcxxabi from
GlobalMutextoInitByteFutexguard implementation for WASM Workers - Add no-op futex wait/wake functions for Emscripten shared memory contexts
- Configure build system to use
-D_LIBCXXABI_USE_FUTEXflag for WASM Workers
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| tools/system_libs.py | Add -D_LIBCXXABI_USE_FUTEX compiler flag for WASM Workers to select the futex-based guard implementation |
| system/lib/libcxxabi/src/cxa_guard_impl.h | Implement no-op futex wait/wake functions for Emscripten with shared memory, enabling use of atomic CAS operations for thread-safe initialization |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // (non-atomic read-then-write allows double initialization). | ||
| // InitByteFutex uses atomic CAS for correct single initialization. | ||
| // Wait/wake are no-ops — losers spin in the CAS retry loop. | ||
| // Cannot use memory.atomic.wait32 (traps on the main browser thread). |
There was a problem hiding this comment.
This seems like something we should be using emscripten_futex_wait and emscripten_futex_wake for.. although I don't think those are currently available in wasm workers.
They probably should be.
It looks like wasm workers does define the emscripten_atomic_wait_u32 and emscripten_atomic_wait_u64 but they are just wrappers arount the __builtin_wasm_memory_atomic_xx functions so cannot be used on the main thread.
@juj @cwoffenden WDYT, should we make the higher level (and safe-to-call) emscripten_futex_wait API available in wasm workers and use it here?
There was a problem hiding this comment.
I agree that a locking solution would be useful, especially one that works with the main thread and various workers. I watched the current state of locks bite the devs here new to Emscripten.
It's not something I have time to look at right now (and for the next months).
When using
-sWASM_WORKERS,__cxa_guard_acquireuses theGlobalMuteximplementation (
pthread_mutex_lock+pthread_cond_wait), but libc linkspthread stubs where these are all noops. This is not just a performance
problem —
GlobalMutexdoes non-atomic read-then-write on the init byteunder a noop lock, so two workers can both see
UNSETand both become theinitializer (double initialization / undefined behavior).
Switch to the
InitByteFuteximplementation which uses atomic CAS forcorrectness. Wait/wake are no-ops so losers spin in the CAS retry loop
rather than sleeping. Cannot use real
memory.atomic.wait32because ittraps on the main browser thread and there is no libcxxabi-compatible way
to detect the main thread (
emscripten_is_main_browser_threadis JS-only).In practice, contention on a single guard is rare and the spin is bounded
by the static constructor duration (typically sub-microsecond).
We hit this in production with
thread_local std::optional<T>(non-trivialdestructor) — first access triggers
__cxa_thread_atexit→static DtorsManager→__cxa_guard_acquire. When multiple WASM Workersstart executing tasks simultaneously, some workers busy-spin on the global
mutex indefinitely, causing timeouts.
Fixes #26277