Resolve room handle synchronously in on_disconnect#1066
Conversation
The handler spawned an async task that retrieved the FfiRoom via the DashMap inside the spawn body and unwrap()'d the result. The synchronous DisconnectResponse returned to the foreign-language client before the spawn was scheduled, so a client that disposed its room handle as soon as the request returned would race the tokio scheduler. When the spawn finally polled, retrieve_handle saw the entry already removed by drop_handle and the unwrap aborted the process via SIGABRT inside the runtime worker. On iOS the race lost roughly half the time. Move the retrieve_handle + clone before the spawn. The spawn now owns its Arc<FfiRoom> clone, so the close runs even if the foreign side has already removed the DashMap entry. Also replace the unwrap with `?` so a genuinely missing handle surfaces as a proper FfiResult error rather than an abort. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
No changeset foundThis PR modifies the following packages but doesn't include a changeset: Directly changed:
Click here to create a changeset The link pre-populates a changeset file with If this change doesn't require a version bump, add the |
|
Would the fix in max/fix-on-disconnect-handle-race not be needed if we dispose the room only after the disconnect event arrives back in Unity? |
|
What about the ffi_room.close call on an already disposed room? Isn't the ffi_room now cached and in the spawn it could be invalid already? What kind of error behaviour will happen if we lose the race now? |
|
Already fixed by DZ in #1069 |
Summary
on_disconnectspawned an async task that retrieved theFfiRoomvia the DashMap inside the spawn body andunwrap()'d the result. The synchronousDisconnectResponsereturned to the foreign-language client before the spawn was scheduled, so a client that disposed its room handle as soon as the request returned would race the tokio scheduler. When the spawn finally polled,retrieve_handlesaw the entry already removed bydrop_handleand the unwrap aborted the process viaSIGABRTinside the runtime worker. On iOS the race lost roughly half the time.Move the
retrieve_handle + clonebefore the spawn. The spawn now owns itsArc<FfiRoom>clone, so the close runs even if the foreign side has already removed the DashMap entry. Also replace theunwrapwith?so a genuinely missing handle surfaces as a properFfiResulterror rather than an abort.Surfaced in the Unity SDK after a separate change started disposing the room handle synchronously on disconnect (rather than waiting for the
Disconnectedevent + GC of the SafeHandle). Either change is reasonable on the foreign side; the FFI server should be robust to it.Test plan
Unity
Found in this Unity PR:
livekit/client-sdk-unity#278