Skip to content

fix: stop commands flush batches and resolve on hardware write#906

Open
qdot wants to merge 3 commits into
feat/task-scope-lifecyclefrom
fix/stop-command-write-ack
Open

fix: stop commands flush batches and resolve on hardware write#906
qdot wants to merge 3 commits into
feat/task-scope-lifecyclefrom
fix/stop-command-write-ack

Conversation

@qdot

@qdot qdot commented Jun 10, 2026

Copy link
Copy Markdown
Member

Summary

Fixes the stop-command loss bug surfaced during #905's testing (pre-existing on master): DeviceHandle::stop() / stop_devices() / server shutdown resolved when stop commands were queued to the device io task, not when they were written. With message_gap batching, a disconnect immediately after stop could tear down the io task inside the batch window and silently drop the pending stop write — leaving a device running after the server believed it was stopped.

Stacked on #905 (base: feat/task-scope-lifecycle) — it depends on the io task's cancellation arm and the lifecycle test harness from that branch.

Mechanism

  • The io channel now carries DeviceTaskMessage { commands, ack: Option<oneshot::Sender<()>> }. An ack'd batch is urgent: the io task merges it with any pending batched commands (existing overlaps dedupe — stop is written last, conflicting pending outputs are replaced), flushes everything to hardware immediately, then fires the ack.
  • handle_stop_device_cmd accumulates the hardware commands for all stop output features into a single ack'd send. Per-feature acks would have corrupted multi-feature devices' wire sequences (one accumulated write, not six) — caught by the 808-test conformance suite during development.
  • The ack wait is bounded (1s) via select! + async_manager::sleep — runtime-agnostic, WASM-safe, no tokio::time::timeout. A wedged or dead device logs and resolves Ok rather than hanging shutdown.
  • Hardening: the io task flushes pending batched commands on its token-cancel and channel-close exits (hardware still present); the hardware-Disconnected exit deliberately does not.
  • Normal output paths are behaviourally identical — dedupe map, output observations, and batching are unchanged for non-stop commands.

Tests (both RED-verified against the unfixed code, deterministic across runs)

  • test_stop_resolves_only_after_stop_write_reaches_hardware — 500ms batch gap, active output, then StopCmd; asserts the stop write is on the wire by the time stop resolves (pure ordering, no sleeps).
  • test_shutdown_writes_stop_before_resolving — the original incident: batched device + active output → server.shutdown() → stop write recorded before shutdown returns.
  • Test util: message_gap is now per-device configurable (default 1ms unchanged; all existing tests byte-identical).

Test Plan

  • cargo test -p buttplug_server — 26 passed
  • cargo test -p buttplug_tests — all green incl. 808 protocol conformance
  • test_task_lifecycle 5x parallel-stable
  • RED→GREEN verified on both new tests (reviewer independently reproduced)
  • WASM: cargo check -p buttplug_server --no-default-features --features wasm --target wasm32-unknown-unknown

🤖 Generated with Claude Code

qdot and others added 3 commits June 10, 2026 13:55
Stop commands previously fire-and-forgot into the device io channel, where
message_gap batching could leave the stop write sitting in pending_commands
until the batch deadline. Since shutdown order is stop then disconnect, the
disconnect routinely beat the deadline and dropped the pending stop write
unflushed, leaving the device running.

The io-channel payload is now DeviceTaskMessage, carrying the commands plus an
optional oneshot write-acknowledgement sender. A message with an ack is urgent:
the io task merges it into any pending batch (existing dedupe), flushes
everything to hardware immediately regardless of the batch deadline, then fires
the ack. Messages without an ack keep the exact prior batching behaviour, so
normal output is unchanged.

The stop path accumulates the hardware commands from every per-feature stop
OutputCmd into a single write-acknowledged batch and awaits the ack, so stop()
(and therefore stop_devices()/shutdown) resolves only once the stop write has
reached hardware. The wait is bounded by a runtime-agnostic 1s timeout (select!
against async_manager::sleep, not tokio::time, for WASM parity); a wedged or
dead device resolves Ok rather than hanging shutdown.

The io task's token-cancel and channel-close exits now best-effort flush
pending_commands first so a stop in the batch window still lands; the
hardware-Disconnected exit deliberately does not flush, since the hardware is
gone.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Two RED-verified lifecycle tests pinning the stop-command write-ack contract,
plus the minimal test-util change they need.

Test util: TestDevice now carries a configurable message_gap (default 1ms,
unchanged for all existing tests). add_test_device_with_message_gap and
test_server_with_device_and_message_gap let a test widen the device io task's
batch window so the stop-write race is deterministic rather than relying on the
1ms default.

test_stop_resolves_only_after_stop_write_reaches_hardware (Test A): with a 500ms
message gap and an active vibrate, a StopCmd must not resolve until the resulting
stop write ([0xF1, 0x00]) has reached the hardware channel. Pre-fix this fails
deterministically — StopCmd fire-and-forgets and resolves while the write sits
in the batch window.

test_shutdown_writes_stop_before_resolving (Test B): the original incident.
With the same batched device in an active output state, server.shutdown() must
record the device's stop write before it resolves. Pre-fix the disconnect tore
the io task down mid-batch and dropped the pending stop write.

Both retire the write-observation limitation documented on
test_shutdown_under_load_drains_subtree (1ms harness gap made it flaky). RED
verified by reverting the two production files to HEAD~1 (2 failed / 0 passed
across three runs); restored exactly (empty production diff); GREEN and stable
across five runs. Full buttplug_tests suite green incl. 808 conformance.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant