FIX: per-frame watchdog so a nested timed call can't disarm the outer timeout#423
Open
ursm wants to merge 1 commit into
Open
FIX: per-frame watchdog so a nested timed call can't disarm the outer timeout#423ursm wants to merge 1 commit into
ursm wants to merge 1 commit into
Conversation
…imeout The per-Context watchdog state (c->wd: mtx/cv/cancel) was shared by every v8_timedwait frame. When a timed dispatch re-enters Ruby and triggers another timed dispatch on the same Context — a host function (or, with the module API, an instantiate resolver) calling eval/call/compile_module — the inner frame's `cancel = 1; pthread_cond_signal()` also woke the OUTER frame's watchdog, which saw the shared cancel flag, broke out, and exited. The outer call then ran with no timeout at all. (The `cancel = 0` reset was also written outside the mutex, a data race with a still-waiting watchdog.) Move the watchdog state into a stack-allocated `struct Watchdog` owned by each v8_timedwait frame and torn down before it returns, so cancelling an inner watchdog can no longer disarm the outer one. Drops the now-unused c->wd field and its init/teardown. Adds a regression test (host fn does a fast nested eval, then the outer script spins) asserting the outer timeout still fires; it fails against the shared watchdog (the spin runs to the test's safety stop) and passes here in ~0.1s. Note: a separate, narrower subtlety remains — because TerminateExecution is isolate-global, if an outer deadline fires *while* a nested call is running, the nested handler's CancelTerminateExecution can still clear it. That needs nesting-aware termination ownership and is left as a follow-up. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The per-
Contextwatchdog state (c->wd—mtx/cv/cancel) was shared by everyv8_timedwaitframe. When a timed dispatch re-enters Ruby and triggers another timed dispatch on the same Context — e.g. a host function callingeval/callagain (and, with the in-flight ES module API, aninstantiateresolver callingcompile_module) — the inner frame's teardown did:The shared
cancel/cvmeant cancelling the inner (usually fast) call's watchdog also woke the outer frame's watchdog, which sawcancel == 1, broke out, and exited — so the outer call then ran with no timeout at all.This is pre-existing on
main; it reproduces with just a timeout and a host fn that re-enterseval:Fix
Move the watchdog into a stack-allocated
struct Watchdogowned by eachv8_timedwaitframe (its ownmtx/cv/cancel/deadline), created when the call arms a timeout and torn down before that frame returns. Cancelling an inner watchdog now touches only the inner state, so it can no longer disarm the outer one. The now-unusedc->wdfield and its init/teardown are dropped, and thecanceldata race is gone (the flag is per-frame and only written under its own mutex). The per-frame condvar uses the same clock asdeadline_ms(CLOCK_MONOTONICoff Apple).Test
test_nested_timed_call_does_not_disarm_outer_timeoutruns a fast nestedevalfrom a host fn, then spins in the outer script, and asserts the outer timeout still fires. A safetystopturns a regression into a slow failure instead of a hang: against the shared watchdog it runs to the 5 s safety stop and fails (elapsed ~5s); with this fix it terminates in ~0.1 s.Known follow-up (out of scope here)
TerminateExecutionis isolate-global, so if an outer deadline fires while a nested call is still running, the nested handler'sCancelTerminateExecution()can still clear it. Closing that requires nesting-aware termination ownership and is left for a separate change.🤖 Generated with Claude Code