Problem
In production we periodically see LlmAgent → Vertex calls go silent for minutes, then surface as a bare asyncio.TimeoutError with no actionable message. A real incident: one LLM call completed cleanly at 08:16:03, the next went silent until 08:19:29 (~3.5 minutes) before raising. Nothing in the logs in the silent window. From the bare exception, we cannot distinguish any of these:
- Never connected (DNS / TCP / TLS failure)
- Connected, request accepted, no response body ever returned
- Mid-stream stall (some bytes arrived, then connection went silent)
- Vertex honored its server-side deadline and returned 5xx
- TCP reset / FIN / NAT eviction
- Our client-side deadline fired against a still-processing server
Without that distinction we cannot decide whether to retry, fail over, escalate, or fix a real bug.
Why the current surface isn't enough
-
HttpOptions.timeout is a single integer (ms) that becomes aiohttp.ClientTimeout(total=...) in google/genai/_api_client.py (the per-request call to session.request(..., timeout=aiohttp.ClientTimeout(total=http_request.timeout))). Only total is set — no sock_read, no sock_connect, no connect.
-
aiohttp.ClientTimeout.total alone does not guarantee the deadline fires on a truly silent read — see aio-libs/aiohttp#11740 (maintainer-confirmed Oct 2025). sock_read is required for a hard ceiling.
-
There is no TraceConfig seam. The SDK builds its own internal aiohttp.ClientSession lazily and never exposes it. We cannot attach hooks for on_connection_create_start, on_request_end, on_response_chunk_received, on_request_exception. Without those, "silent for 3.5 minutes" gives zero signal about what state the transport was in.
-
Per-request HttpOptions(aiohttp_client=session) doesn't work under ADK. ADK's tracing serializes GenerateContentConfig via model_dump, and a live aiohttp.ClientSession is not pydantic-serializable. Result: PydanticSerializationError.
-
HttpOptions(async_client_args={'timeout': ...}) raises TypeError — see googleapis/python-genai#1899.
Current workaround
Subclass Gemini and override the api_client @cached_property (per the docstring at google_llm.py:95-112), returning a Client(http_options=HttpOptions(aiohttp_client=<custom session with TraceConfig + per-phase timeouts>)).
This works, but it has real sharp edges:
- We must manually re-derive
_tracking_headers(), retry_options, base_url, api_version, and the vertexai=True branch. Any default api_client adds upstream that we don't mirror, we silently lose. (Tracking headers in particular are easy to drop and hard to notice.)
- Session lifecycle is on the caller. The session is event-loop-affine, so it must be built inside FastAPI
lifespan and closed on shutdown. ADK provides no documented helper or lifespan hook for this — every adopter rediscovers it.
- The injected session is not visible to ADK telemetry, so per-call diagnostic fields land in our own logs rather than ADK spans.
What we'd like
Any of the following, in roughly preferred order:
-
Land #4345 (already open, adds custom_api_client / custom_live_api_client constructor params). This removes the need to subclass for the "I built my own Client" case. Related: #2560, #5027.
-
Expose per-phase timeouts on HttpOptions for callers who don't want to own a full session. Even just HttpOptions(sock_read_timeout=..., sock_connect_timeout=...) would fix the silent-read ceiling issue (point 2 above) without anyone needing to inject a session.
-
Optional in-flight transport hooks (an ADK-level callback similar to before_model_callback, but firing on transport transitions: on_connect_start, on_request_end, on_first_byte, on_chunk_received, on_request_exception). This is the diagnostic surface the OpenTelemetry GenAI spec marks as TODO and nothing in the ecosystem provides yet.
-
At minimum, document the session-injection contract more loudly — loop-affinity, lifecycle ownership (_api_client.py:2168-2174 already skips closing user sessions, but this isn't called out anywhere user-facing), and which default kwargs an api_client override must preserve.
Related upstream issues
Environment
google-adk: 1.32.0
google-genai: 1.75.0
aiohttp: 3.12.13
- Python: 3.12.13
- Runtime: FastAPI on AWS EKS, Vertex AI (Gemini-3 family)
Problem
In production we periodically see
LlmAgent→ Vertex calls go silent for minutes, then surface as a bareasyncio.TimeoutErrorwith no actionable message. A real incident: one LLM call completed cleanly at 08:16:03, the next went silent until 08:19:29 (~3.5 minutes) before raising. Nothing in the logs in the silent window. From the bare exception, we cannot distinguish any of these:Without that distinction we cannot decide whether to retry, fail over, escalate, or fix a real bug.
Why the current surface isn't enough
HttpOptions.timeoutis a single integer (ms) that becomesaiohttp.ClientTimeout(total=...)ingoogle/genai/_api_client.py(the per-request call tosession.request(..., timeout=aiohttp.ClientTimeout(total=http_request.timeout))). Onlytotalis set — nosock_read, nosock_connect, noconnect.aiohttp.ClientTimeout.totalalone does not guarantee the deadline fires on a truly silent read — see aio-libs/aiohttp#11740 (maintainer-confirmed Oct 2025).sock_readis required for a hard ceiling.There is no
TraceConfigseam. The SDK builds its own internalaiohttp.ClientSessionlazily and never exposes it. We cannot attach hooks foron_connection_create_start,on_request_end,on_response_chunk_received,on_request_exception. Without those, "silent for 3.5 minutes" gives zero signal about what state the transport was in.Per-request
HttpOptions(aiohttp_client=session)doesn't work under ADK. ADK's tracing serializesGenerateContentConfigviamodel_dump, and a liveaiohttp.ClientSessionis not pydantic-serializable. Result:PydanticSerializationError.HttpOptions(async_client_args={'timeout': ...})raisesTypeError— see googleapis/python-genai#1899.Current workaround
Subclass
Geminiand override theapi_client@cached_property(per the docstring atgoogle_llm.py:95-112), returning aClient(http_options=HttpOptions(aiohttp_client=<custom session with TraceConfig + per-phase timeouts>)).This works, but it has real sharp edges:
_tracking_headers(),retry_options,base_url,api_version, and thevertexai=Truebranch. Any defaultapi_clientadds upstream that we don't mirror, we silently lose. (Tracking headers in particular are easy to drop and hard to notice.)lifespanand closed on shutdown. ADK provides no documented helper orlifespanhook for this — every adopter rediscovers it.What we'd like
Any of the following, in roughly preferred order:
Land #4345 (already open, adds
custom_api_client/custom_live_api_clientconstructor params). This removes the need to subclass for the "I built my own Client" case. Related: #2560, #5027.Expose per-phase timeouts on
HttpOptionsfor callers who don't want to own a full session. Even justHttpOptions(sock_read_timeout=..., sock_connect_timeout=...)would fix the silent-read ceiling issue (point 2 above) without anyone needing to inject a session.Optional in-flight transport hooks (an ADK-level callback similar to
before_model_callback, but firing on transport transitions:on_connect_start,on_request_end,on_first_byte,on_chunk_received,on_request_exception). This is the diagnostic surface the OpenTelemetry GenAI spec marks as TODO and nothing in the ecosystem provides yet.At minimum, document the session-injection contract more loudly — loop-affinity, lifecycle ownership (
_api_client.py:2168-2174already skips closing user sessions, but this isn't called out anywhere user-facing), and which default kwargs anapi_clientoverride must preserve.Related upstream issues
HttpOptions(timeout=...)partially ignoredasync_client_args["timeout"]raisesTypeErrorClientTimeout.totalalone can hang on silent readsEnvironment
google-adk: 1.32.0google-genai: 1.75.0aiohttp: 3.12.13