Skip to content

Add Lifecycle Error Callbacks (on_agent_error, on_run_error) to ADK Framework #4774

@evekhm

Description

@evekhm

🔴 Required Information

Is your feature request related to a specific problem?

Yes. Currently, the ADK BasePlugin architecture does not report errors at the agent or invocation (run) level. The framework supports on_model_error_callback and on_tool_error_callback, but if an unhandled exception occurs that crashes the agent process (e.g. fatal tool error, model crashing out of the runner.run_async() loop), the after_agent_callback and after_run_callback bindings are skipped.
Consequently, the AGENT_COMPLETED and INVOCATION_COMPLETED events are never emitted to observability sinks (like the BigQueryAgentAnalyticsPlugin). Because these failed runs disappear from the denominator in standard reports, systems will display artificially perfect durations and artificially high success rates (as the failures are never formally registered).

Describe the Solution You'd Like

The ADK BasePlugin should introduce new dedicated lifecycle error hooks:

  • on_agent_error_callback(agent: Agent, error: Exception, ...)
  • on_run_error_callback(invocation_id: str, error: Exception, ...)
    The BigQueryAgentAnalyticsPlugin (and other telemetry plugins) should implement these to formally emit AGENT_ERROR and INVOCATION_ERROR event types into the downstream datastore immediately prior to the process dying.

Impact on your work

Without these hooks, teams adopting ADK tooling for reliable enterprise deployment cannot accurately measure their agentic pipeline's systemic failure rate. Observability platforms must rely on complicated post-processing state engines rather than simple streaming event aggregations. This is critical for enterprise observability.

🟡 Recommended Information

Describe Alternatives You've Considered

  1. Time-Based Heuristic (Current workaround): Implementing custom BigQuery views that enforce a timeout logic. If a STARTING event hasn't received a corresponding COMPLETED event within that window, we manually flag it as an ERROR. This risks false positives for exceptionally long-running agents, and forces downstream dashboarding metrics to accommodate missing data rather than proper upstream telemetry.

Metadata

Metadata

Labels

needs review[Status] The PR/issue is awaiting review from the maintainertracing[Component] This issue is related to OpenTelemetry tracing

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions