fix: propagate SSE stream errors to waiting requests#2122
Conversation
…rsion _handle_reconnection previously returned None, making it impossible for callers to distinguish between a successful response delivery and exhausted retries. This changes the return type to bool (True on success, False when max attempts exceeded) and fixes two issues: - The attempt counter at line 426 was reset to 0 on each reconnection, causing infinite recursion when streams kept ending without delivering a response. Now increments attempt on each recursion. - All recursive calls now use `return await` so the result propagates back to the original caller. MAX_RECONNECTION_ATTEMPTS increased from 2 to 5 to accommodate legitimate multi-reconnection patterns where the server intentionally closes streams between checkpoints. Github-Issue: modelcontextprotocol#1401
When an SSE stream ends prematurely (e.g. due to a read timeout), the client would hang forever waiting for a response that will never arrive. Now _handle_sse_response checks the return value of _handle_reconnection and, if reconnection did not deliver a response, sends a JSONRPCError with INTERNAL_ERROR to the read stream. This unblocks the waiting request and surfaces the failure as an MCPError to the caller. Github-Issue: modelcontextprotocol#1401
Only reset the attempt counter when events were actually received during the connection. Connections that close immediately without delivering events now count toward MAX_RECONNECTION_ATTEMPTS. Github-Issue:modelcontextprotocol#1401
Transport errors that are not tied to a specific pending request (e.g., GET stream failures) were silently swallowed by the default message handler. Add a warning log so these exceptions are at least visible in logs as an observability safety net. Github-Issue: modelcontextprotocol#1401
a20a405 to
c1fffe8
Compare
Add test_sse_error_when_reconnection_exhausted to exercise the _handle_sse_response path where SSE events are received (setting last_event_id) but reconnection fails, ensuring the JSONRPCError is sent to unblock the waiting request.
90c334d to
f0af07e
Compare
|
I took a pass at refreshing this because the current PR is marked conflicting, and this issue is still referenced from #1401 by downstream users. I did not want to open a duplicate PR without checking first, so I rebuilt the same narrow direction on current
Validation run locally on the refreshed branch: The refreshed patch keeps the original behavior intent: Happy to either open this as a refresh PR against |
Summary
Fixes #1401. Also fixes #1789 (closed as duplicate).
When an SSE read timeout occurs during a StreamableHTTP POST request, the pending
send_requestcall hangs indefinitely. The transport catches the exception but never sends an error back through the read stream, leaving the caller blocked onresponse_stream_reader.receive()with nothing to receive.This PR fixes error propagation at the transport level so that SSE stream failures produce a
JSONRPCErrorkeyed to the original request ID.BaseSession._handle_responseroutes it to the correct per-request response stream, andsend_requestsurfaces it asMCPErrorto the caller. This approach keeps failures isolated to the affected request rather than tearing down the entire session.What changed
_handle_sse_responsenow sends aJSONRPCError(INTERNAL_ERROR, "SSE stream ended without a response")when the SSE stream ends without delivering a complete response, whether due to a read timeout, network error, or unexpected server close. If alast_event_idwas received, reconnection is attempted first; the error is only sent after reconnection is exhausted._handle_reconnectionreturnsboolinstead ofNoneso callers can distinguish success (response delivered) from failure (attempts exhausted). The method also fixes an infinite recursion bug: the attempt counter was reset to 0 on every stream end (even when no complete response was delivered), which combined with httpx read timeouts causing graceful stream termination meant the reconnection loop could run forever.handle_get_streamapplies the same fix to the GET stream's reconnection loop: the attempt counter only resets when events were actually received during the connection. Empty connections that close immediately count towardMAX_RECONNECTION_ATTEMPTS._default_message_handlernow logs a warning for exceptions instead of silently discarding them, providing observability for transport errors not tied to a specific request.Test plan
MCPErroris raised instead of hanging_handle_reconnectionreturnsFalsewhen called at max attemptsruff check .) and type checking (pyright) clean