Skip to content

DatabaseSessionService: get_session / list_sessions use read-write transactions unnecessarily, causing RetryAborted on Spanner #4771

@pztang9

Description

@pztang9

Summary

DatabaseSessionService.get_session() and list_sessions() are pure read operations, but they execute inside SQLAlchemy's default read-write transactions. On Cloud Spanner, this means they use OCC read-write transactions instead of read-only snapshots, which causes RetryAborted errors when another request is concurrently writing to the same session via append_event().

Observed behavior

When two requests for the same session overlap (e.g. a new turn arrives while the previous turn is still writing events), the get_session() read in the new request can fail with RetryAborted from google.cloud.spanner_dbapi.exceptions. This is a transient Spanner OCC conflict — the read-write transaction's read set intersects with the concurrent append_event() write.

This is surprising because get_session() never writes — it only executes SELECT statements.

Real-world impact

In production workloads with rapid successive turns on the same session, the get_session() call fails with RetryAborted, causing the turn to start with missing session state. Application-level retry works around this, but it shouldn't be necessary for a read-only operation.

Root cause

DatabaseSessionService uses a single async_sessionmaker for all operations:

# database_session_service.py, line ~143
self.database_session_factory: async_sessionmaker[DatabaseSessionFactory] = (
    async_sessionmaker(bind=self.db_engine, expire_on_commit=False)
)

Both get_session() and list_sessions() use async with self.database_session_factory() as sql_session: which opens a read-write transaction. On Spanner, read-write transactions participate in OCC and can abort when conflicting with concurrent writes.

The spanner_dbapi connection already supports read_only=True (see google.cloud.spanner_dbapi.connection, line ~78), which switches to snapshot reads that:

  • Never abort due to OCC conflicts
  • Have lower latency (no commit phase)
  • Don't hold locks

Proposed fix

Create a separate async_sessionmaker with read_only=True for read-only operations, and use it in get_session() and list_sessions(). This would only affect the Spanner dialect — other databases (PostgreSQL, MySQL, SQLite) would see no behavioral difference since they don't have Spanner's OCC semantics.

One approach:

# In __init__:
self.database_session_factory = async_sessionmaker(bind=self.db_engine, expire_on_commit=False)

# For Spanner, create a read-only engine/session factory
# Option A: execution_options on the session
# Option B: separate engine with connect_args={"read_only": True}

The cleanest path may be passing execution_options or using SQLAlchemy's Session.execute() with connection-level options, but the exact mechanism depends on how sqlalchemy-spanner propagates read_only to the underlying spanner_dbapi connection.

Environment

  • ADK version: 1.23.x
  • Backend: Spanner (spanner+spanner:///...)
  • SQLAlchemy-Spanner: 1.14+

Related

Metadata

Metadata

Labels

services[Component] This issue is related to runtime services, e.g. sessions, memory, artifacts, etc

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions