-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
Summary
DatabaseSessionService.get_session() and list_sessions() are pure read operations, but they execute inside SQLAlchemy's default read-write transactions. On Cloud Spanner, this means they use OCC read-write transactions instead of read-only snapshots, which causes RetryAborted errors when another request is concurrently writing to the same session via append_event().
Observed behavior
When two requests for the same session overlap (e.g. a new turn arrives while the previous turn is still writing events), the get_session() read in the new request can fail with RetryAborted from google.cloud.spanner_dbapi.exceptions. This is a transient Spanner OCC conflict — the read-write transaction's read set intersects with the concurrent append_event() write.
This is surprising because get_session() never writes — it only executes SELECT statements.
Real-world impact
In production workloads with rapid successive turns on the same session, the get_session() call fails with RetryAborted, causing the turn to start with missing session state. Application-level retry works around this, but it shouldn't be necessary for a read-only operation.
Root cause
DatabaseSessionService uses a single async_sessionmaker for all operations:
# database_session_service.py, line ~143
self.database_session_factory: async_sessionmaker[DatabaseSessionFactory] = (
async_sessionmaker(bind=self.db_engine, expire_on_commit=False)
)Both get_session() and list_sessions() use async with self.database_session_factory() as sql_session: which opens a read-write transaction. On Spanner, read-write transactions participate in OCC and can abort when conflicting with concurrent writes.
The spanner_dbapi connection already supports read_only=True (see google.cloud.spanner_dbapi.connection, line ~78), which switches to snapshot reads that:
- Never abort due to OCC conflicts
- Have lower latency (no commit phase)
- Don't hold locks
Proposed fix
Create a separate async_sessionmaker with read_only=True for read-only operations, and use it in get_session() and list_sessions(). This would only affect the Spanner dialect — other databases (PostgreSQL, MySQL, SQLite) would see no behavioral difference since they don't have Spanner's OCC semantics.
One approach:
# In __init__:
self.database_session_factory = async_sessionmaker(bind=self.db_engine, expire_on_commit=False)
# For Spanner, create a read-only engine/session factory
# Option A: execution_options on the session
# Option B: separate engine with connect_args={"read_only": True}The cleanest path may be passing execution_options or using SQLAlchemy's Session.execute() with connection-level options, but the exact mechanism depends on how sqlalchemy-spanner propagates read_only to the underlying spanner_dbapi connection.
Environment
- ADK version: 1.23.x
- Backend: Spanner (
spanner+spanner:///...) - SQLAlchemy-Spanner: 1.14+
Related
- DatabaseSessionService.append_event reload-and-continue in 1.24+ corrupts session under concurrent run_async #4751 — OCC corruption from concurrent
append_event(same underlying Spanner concurrency theme) - DatabaseSessionService.append_event: unnecessary FOR UPDATE lock on app_states/user_states when no state delta exists #4655 — unnecessary
FOR UPDATElock on app/user states