Gemini: cannot bound or diagnose silent reads — `HttpOptions.timeout` only sets aiohttp `total`

## Problem

In production we periodically see `LlmAgent` → Vertex calls go silent for minutes, then surface as a bare `asyncio.TimeoutError` with no actionable message. A real incident: one LLM call completed cleanly at 08:16:03, the next went silent until 08:19:29 (~3.5 minutes) before raising. Nothing in the logs in the silent window. From the bare exception, we cannot distinguish any of these:

- Never connected (DNS / TCP / TLS failure)
- Connected, request accepted, no response body ever returned
- Mid-stream stall (some bytes arrived, then connection went silent)
- Vertex honored its server-side deadline and returned 5xx
- TCP reset / FIN / NAT eviction
- Our client-side deadline fired against a still-processing server

Without that distinction we cannot decide whether to retry, fail over, escalate, or fix a real bug.

## Why the current surface isn't enough

1. **`HttpOptions.timeout` is a single integer (ms)** that becomes `aiohttp.ClientTimeout(total=...)` in `google/genai/_api_client.py` (the per-request call to `session.request(..., timeout=aiohttp.ClientTimeout(total=http_request.timeout))`). Only `total` is set — no `sock_read`, no `sock_connect`, no `connect`.

2. **`aiohttp.ClientTimeout.total` alone does not guarantee the deadline fires** on a truly silent read — see [aio-libs/aiohttp#11740](https://github.com/aio-libs/aiohttp/issues/11740) (maintainer-confirmed Oct 2025). `sock_read` is required for a hard ceiling.

3. **There is no `TraceConfig` seam.** The SDK builds its own internal `aiohttp.ClientSession` lazily and never exposes it. We cannot attach hooks for `on_connection_create_start`, `on_request_end`, `on_response_chunk_received`, `on_request_exception`. Without those, "silent for 3.5 minutes" gives zero signal about what state the transport was in.

4. **Per-request `HttpOptions(aiohttp_client=session)` doesn't work under ADK.** ADK's tracing serializes `GenerateContentConfig` via `model_dump`, and a live `aiohttp.ClientSession` is not pydantic-serializable. Result: `PydanticSerializationError`.

5. **`HttpOptions(async_client_args={'timeout': ...})` raises `TypeError`** — see [googleapis/python-genai#1899](https://github.com/googleapis/python-genai/issues/1899).

## Current workaround

Subclass `Gemini` and override the `api_client` `@cached_property` (per the docstring at `google_llm.py:95-112`), returning a `Client(http_options=HttpOptions(aiohttp_client=<custom session with TraceConfig + per-phase timeouts>))`.

This works, but it has real sharp edges:

- We must manually re-derive `_tracking_headers()`, `retry_options`, `base_url`, `api_version`, and the `vertexai=True` branch. Any default `api_client` adds upstream that we don't mirror, we silently lose. (Tracking headers in particular are easy to drop and hard to notice.)
- Session lifecycle is on the caller. The session is event-loop-affine, so it must be built inside FastAPI `lifespan` and closed on shutdown. ADK provides no documented helper or `lifespan` hook for this — every adopter rediscovers it.
- The injected session is not visible to ADK telemetry, so per-call diagnostic fields land in our own logs rather than ADK spans.

## What we'd like

Any of the following, in roughly preferred order:

1. **Land [#4345](https://github.com/google/adk-python/pull/4345)** (already open, adds `custom_api_client` / `custom_live_api_client` constructor params). This removes the need to subclass for the "I built my own Client" case. Related: [#2560](https://github.com/google/adk-python/issues/2560), [#5027](https://github.com/google/adk-python/issues/5027).

2. **Expose per-phase timeouts on `HttpOptions`** for callers who don't want to own a full session. Even just `HttpOptions(sock_read_timeout=..., sock_connect_timeout=...)` would fix the silent-read ceiling issue (point 2 above) without anyone needing to inject a session.

3. **Optional in-flight transport hooks** (an ADK-level callback similar to `before_model_callback`, but firing on transport transitions: `on_connect_start`, `on_request_end`, `on_first_byte`, `on_chunk_received`, `on_request_exception`). This is the diagnostic surface the OpenTelemetry GenAI spec [marks as TODO](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/) and nothing in the ecosystem provides yet.

4. **At minimum, document the session-injection contract more loudly** — loop-affinity, lifecycle ownership (`_api_client.py:2168-2174` already skips closing user sessions, but this isn't called out anywhere user-facing), and which default kwargs an `api_client` override must preserve.

## Related upstream issues

- [googleapis/python-genai#1893](https://github.com/googleapis/python-genai/issues/1893) — genai request hangs in "zombie state" with open socket and no data (gemini-2.5-flash)
- [googleapis/python-genai#911](https://github.com/googleapis/python-genai/issues/911) — `HttpOptions(timeout=...)` partially ignored
- [googleapis/python-genai#1899](https://github.com/googleapis/python-genai/issues/1899) — `async_client_args["timeout"]` raises `TypeError`
- [aio-libs/aiohttp#11740](https://github.com/aio-libs/aiohttp/issues/11740) — `ClientTimeout.total` alone can hang on silent reads

## Environment

- `google-adk`: 1.32.0
- `google-genai`: 1.75.0
- `aiohttp`: 3.12.13
- Python: 3.12.13
- Runtime: FastAPI on AWS EKS, Vertex AI (Gemini-3 family)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gemini: cannot bound or diagnose silent reads — `HttpOptions.timeout` only sets aiohttp `total` #5802

Problem

Why the current surface isn't enough

Current workaround

What we'd like

Related upstream issues

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Gemini: cannot bound or diagnose silent reads — HttpOptions.timeout only sets aiohttp total #5802

Description

Problem

Why the current surface isn't enough

Current workaround

What we'd like

Related upstream issues

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Gemini: cannot bound or diagnose silent reads — `HttpOptions.timeout` only sets aiohttp `total` #5802