Skip to content

fix(websocket): increment failover counter on externally-closed connections#704

Draft
cl-efornaciari wants to merge 1 commit intomainfrom
feature/ws-1005-fix-failover-counter
Draft

fix(websocket): increment failover counter on externally-closed connections#704
cl-efornaciari wants to merge 1 commit intomainfrom
feature/ws-1005-fix-failover-counter

Conversation

@cl-efornaciari
Copy link
Contributor

Summary

  • Fixes the WebSocket 1005 reconnect loop bug where externally-closed connections never incremented the failover counter, trapping the EA on the same failing endpoint indefinitely.
  • Adds an else if branch in WebSocketTransport.streamHandler that detects when a connection was closed externally (e.g., provider drops with code 1005) and increments streamHandlerInvocationsWithNoConnection, enabling wsSelectUrl to try alternate URLs.
  • Adjusts the 3 bug-demonstration tests from test(websocket): add tests reproducing 1005 reconnect loop failover bug #703 to validate the fix works: counter now increments during rapid 1005 close loops, and the EA recovers after the server stabilizes.

Root Cause

When a provider drops a WebSocket connection with close code 1005, streamHandler finds connectionClosed == true and immediately reconnects, resetting connectionOpenedAt. On the next cycle, timeSinceConnectionOpened is only ~5s (one BACKGROUND_EXECUTE_MS_WS interval), which is below WS_SUBSCRIPTION_UNRESPONSIVE_TTL (default 120s). Since connectionUnresponsive never becomes true, the failover counter stays at 0 and the EA reconnects to the same failing endpoint forever.

Fix

After the existing connectionUnresponsive check, add a second path that detects when connectionClosed && this.wsConnection (connection was closed externally, not by the framework) and increments the failover counter. The else if prevents double-counting when both conditions are true.

Related

Test plan

  • All 18 existing + new websocket tests pass
  • Test 1 validates failover counter increments during rapid 1005 close loop
  • Test 2 (control) validates unresponsive-but-open connections still increment counter (unchanged)
  • Test 3 validates EA recovers from reconnect loop after server stabilizes (serves prices again)

Made with Cursor

…ctions

When a WebSocket connection is closed externally (e.g., provider drops
with code 1005), streamHandler now detects this and increments the
failover counter. Previously, only open-but-unresponsive connections
incremented the counter, causing externally-closed connections to
reconnect to the same failing endpoint indefinitely.

The fix adds an `else if` branch after the existing connectionUnresponsive
check that detects when connectionClosed is true with an existing
wsConnection, indicating the server dropped the connection between
streamHandler cycles.

Tests are adjusted from bug-demonstration (asserting counter === 0) to
fix-validation (asserting counter > 0 and EA recovery after server
stabilizes).
@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

NPM Publishing labels 🏷️

🛑 This PR needs labels to indicate how to increase the current package version in the automated workflows. Please add one of the following labels: none, patch, minor, or major.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant