fix: stop commands flush batches and resolve on hardware write#906
Open
qdot wants to merge 3 commits into
Open
fix: stop commands flush batches and resolve on hardware write#906qdot wants to merge 3 commits into
qdot wants to merge 3 commits into
Conversation
Stop commands previously fire-and-forgot into the device io channel, where message_gap batching could leave the stop write sitting in pending_commands until the batch deadline. Since shutdown order is stop then disconnect, the disconnect routinely beat the deadline and dropped the pending stop write unflushed, leaving the device running. The io-channel payload is now DeviceTaskMessage, carrying the commands plus an optional oneshot write-acknowledgement sender. A message with an ack is urgent: the io task merges it into any pending batch (existing dedupe), flushes everything to hardware immediately regardless of the batch deadline, then fires the ack. Messages without an ack keep the exact prior batching behaviour, so normal output is unchanged. The stop path accumulates the hardware commands from every per-feature stop OutputCmd into a single write-acknowledged batch and awaits the ack, so stop() (and therefore stop_devices()/shutdown) resolves only once the stop write has reached hardware. The wait is bounded by a runtime-agnostic 1s timeout (select! against async_manager::sleep, not tokio::time, for WASM parity); a wedged or dead device resolves Ok rather than hanging shutdown. The io task's token-cancel and channel-close exits now best-effort flush pending_commands first so a stop in the batch window still lands; the hardware-Disconnected exit deliberately does not flush, since the hardware is gone. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Two RED-verified lifecycle tests pinning the stop-command write-ack contract, plus the minimal test-util change they need. Test util: TestDevice now carries a configurable message_gap (default 1ms, unchanged for all existing tests). add_test_device_with_message_gap and test_server_with_device_and_message_gap let a test widen the device io task's batch window so the stop-write race is deterministic rather than relying on the 1ms default. test_stop_resolves_only_after_stop_write_reaches_hardware (Test A): with a 500ms message gap and an active vibrate, a StopCmd must not resolve until the resulting stop write ([0xF1, 0x00]) has reached the hardware channel. Pre-fix this fails deterministically — StopCmd fire-and-forgets and resolves while the write sits in the batch window. test_shutdown_writes_stop_before_resolving (Test B): the original incident. With the same batched device in an active output state, server.shutdown() must record the device's stop write before it resolves. Pre-fix the disconnect tore the io task down mid-batch and dropped the pending stop write. Both retire the write-observation limitation documented on test_shutdown_under_load_drains_subtree (1ms harness gap made it flaky). RED verified by reverting the two production files to HEAD~1 (2 failed / 0 passed across three runs); restored exactly (empty production diff); GREEN and stable across five runs. Full buttplug_tests suite green incl. 808 conformance. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the stop-command loss bug surfaced during #905's testing (pre-existing on master):
DeviceHandle::stop()/stop_devices()/ server shutdown resolved when stop commands were queued to the device io task, not when they were written. Withmessage_gapbatching, a disconnect immediately after stop could tear down the io task inside the batch window and silently drop the pending stop write — leaving a device running after the server believed it was stopped.Stacked on #905 (base:
feat/task-scope-lifecycle) — it depends on the io task's cancellation arm and the lifecycle test harness from that branch.Mechanism
DeviceTaskMessage { commands, ack: Option<oneshot::Sender<()>> }. An ack'd batch is urgent: the io task merges it with any pending batched commands (existingoverlapsdedupe — stop is written last, conflicting pending outputs are replaced), flushes everything to hardware immediately, then fires the ack.handle_stop_device_cmdaccumulates the hardware commands for all stop output features into a single ack'd send. Per-feature acks would have corrupted multi-feature devices' wire sequences (one accumulated write, not six) — caught by the 808-test conformance suite during development.select!+async_manager::sleep— runtime-agnostic, WASM-safe, notokio::time::timeout. A wedged or dead device logs and resolvesOkrather than hanging shutdown.Disconnectedexit deliberately does not.Tests (both RED-verified against the unfixed code, deterministic across runs)
test_stop_resolves_only_after_stop_write_reaches_hardware— 500ms batch gap, active output, then StopCmd; asserts the stop write is on the wire by the time stop resolves (pure ordering, no sleeps).test_shutdown_writes_stop_before_resolving— the original incident: batched device + active output →server.shutdown()→ stop write recorded before shutdown returns.message_gapis now per-device configurable (default 1ms unchanged; all existing tests byte-identical).Test Plan
cargo test -p buttplug_server— 26 passedcargo test -p buttplug_tests— all green incl. 808 protocol conformancetest_task_lifecycle5x parallel-stablecargo check -p buttplug_server --no-default-features --features wasm --target wasm32-unknown-unknown🤖 Generated with Claude Code