Conversation
11b393a to
2903de2
Compare
2903de2 to
5e25405
Compare
4115b83 to
cbffa11
Compare
|
/coder-agents-review |
There was a problem hiding this comment.
Clean instrumentation layer. SshTelemetry and WebSocketTelemetry absorb real complexity (sampling thresholds, reconnect cycle aggregation, dedup guards) that would otherwise pollute their consumers. The TelemetryReporter interface extraction with NOOP default is well-designed ISP. Test coverage is approximately 1:1 with source, and assertions verify event shapes rather than just counts. The network sampling logic is well-calibrated (p2p flips, DERP changes, latency swings, heartbeat).
Severity summary: 1 P2, 10 P3, 6 Nit. The P2 is a one-line fix. Most P3s are telemetry data model concerns that are cheapest to address before the events ship to dashboards.
"Someone querying ssh.process.discovered events to measure discovery success rate or latency would count these ghost discoveries as successes." (Chopper)
🤖 This review was automatically generated with Coder Agents.
|
/coder-agents-review |
- ssh.process.discovered: emit found=true|false so abandoned searches are distinguishable - Add ssh.process.disposed as a dedicated terminal event with wasLost flag - ssh.process.replaced: always emit, never conflate with recovered - Remove dead process_changed and disposed members from ProcessLossCause - Rename connection events to past-tense and url -> route property - Route normal_close through disconnectWithReason for state-machine consistency - Distinguish certificate_refresh from manual_reconnect on cycle reason - Clear #reconnectCycle in WebSocketTelemetry.reset() - Restore Only attempt refresh once per connection cycle comment - Unblock SshProcessMonitor.delay() on dispose so abandoned traces complete - Add unit and integration coverage for the new event shapes
There was a problem hiding this comment.
All 17 R1 findings verified fixed in 5ed31ab. Clean work. The refactored disposed() method, #reconnectInternal cert-refresh path, past-tense event naming, and expanded integration test coverage all address root causes.
Three new P3 findings surfaced during re-review: one concurrent-access regression in the new pendingDelayResolve mechanism, and two test accuracy gaps. The integration test for stale-network recovery exercises the wrong event chain (4/6 reviewers converged on this independently).
Severity summary: 0 P2, 3 P3.
"These events are from two completely different monitoring cycles. The recovery is not 'around' the stale-network reconnect." (Bisky)
🤖 This review was automatically generated with Coder Agents.
- Replace single pendingDelayResolve with a Set so dispose unblocks every concurrent searchForLogFile/monitorNetwork delay, not just the most recent one. - Use a single PID throughout the stale-network reconnect test so the asserted recovered event matches the stale-network cycle the test name describes. - Dispose the monitor before the negative recovered assertion in the process-replaced test to keep the 888 monitoring loop from racing. - Add dedicated processRecovered unit tests (early return, emission shape, single-emit guard).
|
/coder-agents-review |
There was a problem hiding this comment.
All 20 prior findings (R1 + R3) verified fixed. The pendingDelays Set, processRecovered() unit tests, and deterministic dispose-before-assert all address root causes correctly. Code quality is high; 114 tests pass, every assertion has teeth.
One new P3 on event naming semantics, two Nits from the mechanical pass. The P3 is cheapest to address before the events ship to dashboards.
Severity summary: 0 P2, 1 P3, 2 Nit.
"The verb is the event. 'connection.reconnected' breaks this pattern: it emits with result: 'error' when the reconnect cycle terminates without reconnecting." (Mafuuu)
🤖 This review was automatically generated with Coder Agents.
- Rename connection.reconnected to connection.reconnect_resolved so the event name reflects "cycle ended, outcome in result" instead of implying success. Failed cycles emit with result=error and are no longer miscounted by name-only consumers. - Drop unused export on ProcessDiscoveryResult. - Replace em-dash in sshProcess.test.ts comment.
Summary
Closes #905.
Validation
pnpm test:extension ./test/unit/remote/sshProcess.test.tspnpm test:extension ./test/unit/websocket/reconnectingWebSocket.test.tspnpm typecheckpnpm lintpnpm format:checkImplementation plan and decisions
Confirmed decisions:
CoderApiinstances; long-lived primary/remote clients receive telemetry.connection.openuses the sanitized route (pathname + search) rather than the full URL.DISCONNECTEDoutcomes as well as successful recovery.stale_network_info,missing_network_info,process_changed,disposed.derpuses thepreferred_derpstring, empty string if unavailable.Implemented:
SshProcessMonitor:ssh.process.discovered,ssh.process.lost,ssh.process.recovered, and sampledssh.network.info.ReconnectingWebSocket:connection.state_transition,connection.open,connection.drop, and aggregatedconnection.reconnect.Generated by Coder Agents.