Skip to content

[libcxxabi] Use InitByteFutex for __cxa_guard in WASM Workers#26283

Open
AndreyRepko wants to merge 1 commit intoemscripten-core:mainfrom
AndreyRepko:fix/wasm-workers-cxa-guard-futex
Open

[libcxxabi] Use InitByteFutex for __cxa_guard in WASM Workers#26283
AndreyRepko wants to merge 1 commit intoemscripten-core:mainfrom
AndreyRepko:fix/wasm-workers-cxa-guard-futex

Conversation

@AndreyRepko
Copy link

When using -sWASM_WORKERS, __cxa_guard_acquire uses the GlobalMutex
implementation (pthread_mutex_lock + pthread_cond_wait), but libc links
pthread stubs where these are all noops. This is not just a performance
problem — GlobalMutex does non-atomic read-then-write on the init byte
under a noop lock, so two workers can both see UNSET and both become the
initializer (double initialization / undefined behavior).

Switch to the InitByteFutex implementation which uses atomic CAS for
correctness. Wait/wake are no-ops so losers spin in the CAS retry loop
rather than sleeping. Cannot use real memory.atomic.wait32 because it
traps on the main browser thread and there is no libcxxabi-compatible way
to detect the main thread (emscripten_is_main_browser_thread is JS-only).
In practice, contention on a single guard is rare and the spin is bounded
by the static constructor duration (typically sub-microsecond).

We hit this in production with thread_local std::optional<T> (non-trivial
destructor) — first access triggers __cxa_thread_atexit
static DtorsManager__cxa_guard_acquire. When multiple WASM Workers
start executing tasks simultaneously, some workers busy-spin on the global
mutex indefinitely, causing timeouts.

Fixes #26277

When using -sWASM_WORKERS, __cxa_guard_acquire uses the GlobalMutex
implementation (pthread_mutex_lock + pthread_cond_wait), but libc links
pthread stubs where these are all noops.  This is not just a performance
problem — GlobalMutex does non-atomic read-then-write on the init byte
under a noop lock, so two workers can both see UNSET and both become
the initializer (double initialization / undefined behavior).

Switch to the InitByteFutex implementation which uses atomic CAS for
correctness.  Wait/wake are no-ops so losers spin in the CAS retry
loop rather than sleeping.  Cannot use real memory.atomic.wait32
because it traps on the main browser thread and there is no
libcxxabi-compatible way to detect the main thread
(emscripten_is_main_browser_thread is JS-only).  In practice,
contention on a single guard is rare and the spin is bounded by the
static constructor duration (typically sub-microsecond).

Fixes emscripten-core#26277
Copilot AI review requested due to automatic review settings February 17, 2026 11:41
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical bug in WASM Workers where __cxa_guard_acquire (used for C++ static local variable initialization) can cause double initialization or indefinite busy-spinning when multiple workers trigger the same static initialization simultaneously.

Changes:

  • Switch libcxxabi from GlobalMutex to InitByteFutex guard implementation for WASM Workers
  • Add no-op futex wait/wake functions for Emscripten shared memory contexts
  • Configure build system to use -D_LIBCXXABI_USE_FUTEX flag for WASM Workers

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
tools/system_libs.py Add -D_LIBCXXABI_USE_FUTEX compiler flag for WASM Workers to select the futex-based guard implementation
system/lib/libcxxabi/src/cxa_guard_impl.h Implement no-op futex wait/wake functions for Emscripten with shared memory, enabling use of atomic CAS operations for thread-safe initialization

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// (non-atomic read-then-write allows double initialization).
// InitByteFutex uses atomic CAS for correct single initialization.
// Wait/wake are no-ops — losers spin in the CAS retry loop.
// Cannot use memory.atomic.wait32 (traps on the main browser thread).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like something we should be using emscripten_futex_wait and emscripten_futex_wake for.. although I don't think those are currently available in wasm workers.

They probably should be.

It looks like wasm workers does define the emscripten_atomic_wait_u32 and emscripten_atomic_wait_u64 but they are just wrappers arount the __builtin_wasm_memory_atomic_xx functions so cannot be used on the main thread.

@juj @cwoffenden WDYT, should we make the higher level (and safe-to-call) emscripten_futex_wait API available in wasm workers and use it here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that a locking solution would be useful, especially one that works with the main thread and various workers. I watched the current state of locks bite the devs here new to Emscripten.

It's not something I have time to look at right now (and for the next months).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

__cxa_guard_acquire busy-spins in WASM Workers leads to dead-lock

3 participants

Comments