make async tools first-class: direct registration, survives pause/resume by longcw · Pull Request #5711 · livekit/agents

longcw · 2026-05-12T09:57:09Z

Summary

Direct registration. Any @function_tool with an AsyncRunContext parameter is routed through an activity-scoped _AsyncToolManager automatically — no need to put it inside an AsyncToolset. get_running_tasks / cancel_task surface on the activity only while a task is in flight. AsyncRunContext is re-exported from livekit.agents.
Survives pause/resume. Async toolsets and the activity's manager outlive a pause. When an agent hands off to a sub-task (e.g. GetEmailTask), in-flight tasks keep running; their replies are delivered once the original activity is current and idle again. Toolset setup moved into start() so MCP / async-tool state persists across resume().
wait_for_idle gates the deferred reply. New on both AgentActivity and AgentSession. Drains in-flight speech, the speech queue, and _background_speeches, so a deferred tool reply doesn't fire while a regular tool's dispatch (one that handed off to a sub-task) is still running. wait_for_inactive is deprecated and forwards.
mock_tools() works with async tools. The mock-precedence check moved ahead of the async-manager branch in dispatch, so async tools can be replaced with mocks in tests the same way regular tools can.
Unified ToolError contract. Argument validation raises ToolError directly (one place owns the message the LLM sees); execute_function_call no longer logs a traceback for ToolError. Regular and async dispatch share a single _execute(ctx) closure.

Backward compatibility

AsyncToolset(tools=[...]) unchanged — _wrap_tool body delegates to its own _AsyncToolManager but external behavior (duplicate handling, progress narration, cancel) is preserved.

Example

from livekit.agents import Agent, AsyncRunContext
from livekit.agents.llm import function_tool

class MyAgent(Agent):
    @function_tool
    async def search_news(self, ctx: AsyncRunContext, topic: str) -> str:
        await ctx.update(f"searching the news for {topic}...")
        await asyncio.sleep(10)
        return f"Top story on {topic}: nothing dramatic happened today."

No AsyncToolset wrapping. The agent acknowledges immediately via the update, then narrates the final result.

See examples/voice_agents/basic_agent.py for the full example.

Extract `_AsyncToolManager` from `AsyncToolset` so the lifecycle pieces — AsyncRunContext creation, background task spawn, duplicate handling, progress reply coalescing — can be used independently of the toolset wrapper. Each `AgentActivity` now owns one such manager (`on_duplicate_call="reject"`), and at tool dispatch any tool whose signature includes `AsyncRunContext` is routed through that manager — no upfront grouping or `AsyncToolset` wrapping needed. `AsyncToolset` continues to work unchanged: it delegates to its own internal manager and still always exposes `get_running_tasks` / `cancel_task`. The activity surfaces them dynamically — only while at least one activity-managed task is live — so agents that never invoke an async tool see no extra tools in their LLM context. `ToolContext` dedupes by name, so both can coexist. Tools placed inside an explicit `AsyncToolset` retain its richer semantics (toolset-scoped lifetime that survives agent handoff, configurable `on_duplicate_call`). Tools placed directly in `agent.tools` get the activity-scoped default — tasks die on handoff. `AsyncRunContext` is now re-exported from `livekit.agents` next to `RunContext`, so tools can declare it without reaching into the internal module path.

…ync tools Two PR-review fixes: 1. _close_session() now aclose()s the activity's _AsyncToolManager so the pause() path also cancels in-flight bare async tool tasks. Previously only aclose() did this, so a late ctx.update() / return from a paused activity could touch the next agent's chat context via _enqueue_reply. 2. mock_tools() now takes precedence over the activity manager. The check moved ahead of the use_async_manager branch in _execute_tools_task. For async tools with a mock, we pass raw JSON kwargs straight through — _run_mock trims to the mock's actual signature, so the AsyncRunContext param of the original tool isn't required on the mock.

Two layers of cleanup on top of the activity-scoped manager: 1. `prepare_function_arguments` now raises `ToolError` directly for argument validation failures (bad JSON, ValidationError, missing required params, context-type mismatch). Removes the matching try/except boilerplate from `execute_function_call`, `tool_proxy.py::_handle_call`, and the now-deleted helper in `generation.py`. One place owns the validation→ToolError contract; the message LLMs see is consistent. 2. `_execute_tools_task` now uses a single `_execute(ctx)` closure for sync and async tools. The closure preps args against the tool's signature and either runs the mock or the real tool. Sync path partial-binds with a plain `RunContext`; async path hands the closure to `async_tool_manager.spawn`, which calls it with an `AsyncRunContext`. `_run_mock` moves to module level in `run_result.py` so it's shared between dispatch sites. 3. `_AsyncToolManager.spawn` now takes a `function_callable` (the body) and `run_ctx` only — no `tool` or `raw_arguments`. Callers build the closure that captures whatever they need. The manager is a pure async-lifecycle service. 4. AsyncToolset wrappers handle their own mock lookup so mocks of wrapped async tools run through the toolset's manager (full async lifecycle). `_is_async_toolset_wrapper(tool)` is the internal contract that tells dispatch in `_execute_tools_task` to defer mock handling. 5. Activity-scoped manager teardown moved into `_close_session` so the pause() path (not just aclose) cancels in-flight bare async tool tasks. Otherwise a late `ctx.update()` from a paused activity could touch the next agent's chat context. Tests collapsed into `tests/test_async_tool.py`: - 13 unit tests for the manager and dispatch helpers, no real session. - 2 e2e tests driving a real `AgentSession` with a scripted `FakeLLM` and `mock_tools` — including the post-update-raise path that proves errors surface to the LLM via `_enqueue_reply` → `REPLY_INSTRUCTIONS` → `generate_reply`.

prepare_function_arguments now wraps validation failures as ToolError ("Error parsing arguments for ..."). Update the three tests that still expected ValueError or the old "invalid parameters" JSON message.

ToolError from arg parsing is already logged inside prepare_function_arguments; user-raised ToolError is an intentional message to the LLM. Neither needs a traceback at logger.exception level.

Async toolsets and the activity-scoped async-tool manager now outlive a pause (e.g. handoff to a sub-task like GetEmailTask). Replies queued by a background tool are deferred until the originating activity is current, unpaused, and has no in-flight speech or tool dispatch. Key pieces: - Toolset setup moves from `_start_session` to `start()` so MCP and AsyncToolset state persist across resume. Tear-down only runs in `aclose()`, not in the pause path. - `_AsyncToolManager` now knows its `owning_activity`; the eager chat_ctx insert and the deferred reply route through that activity's agent (or the current activity for session-scoped toolsets). - `AgentActivity.wait_for_idle` / `AgentSession.wait_for_idle` are the new readiness gates; `wait_for_idle` also drains `_background_speeches` so an async tool that fires during another tool's dispatch waits for that dispatch to complete. - The deferred-reply task is watched by the active `RunResult` so a run in flight doesn't return before the reply lands. - `REPLY_INSTRUCTIONS` asks the LLM to return an empty response when the update was already conveyed, replacing the timestamp-based skip that produced silent drops. - `ActivityClosedError` lives next to `AgentActivity`; `_deliver_reply` catches it via lazy import to avoid a runtime cycle.

Drop the update_chat_ctx await between the _pending_updates snapshot/clear and return — preserves the 'no await after this line' invariant so a concurrent _enqueue_reply can't stash an update that the in-flight _deliver_reply never sees. The chat_ctx for the handoff case is built synchronously and threaded into session.generate_reply via the chat_ctx kwarg. Also mirror normal tool dispatch by inserting items into session.history at enqueue time.

When _aclose_impl set _activity = None it left any pending wait_for_idle waiter blocked on _activity_changed_event forever. Set the event on close, and have wait_for_idle bail with ActivityClosedError once self._closing is set and no activity remains.

chenghao-mou requested a review from a team May 12, 2026 09:57

This comment was marked as resolved.

Sign in to view

longcw added 3 commits May 13, 2026 09:22

Merge remote-tracking branch 'origin/main' into longc/async-task-manager

4191c59

This comment was marked as resolved.

Sign in to view

longcw added 2 commits May 13, 2026 18:12

test: align tool-arg validation tests with ToolError contract

d06f0f9

prepare_function_arguments now wraps validation failures as ToolError ("Error parsing arguments for ..."). Update the three tests that still expected ValueError or the old "invalid parameters" JSON message.

fix: don't log ToolError with traceback in execute_function_call

f70e222

ToolError from arg parsing is already logged inside prepare_function_arguments; user-raised ToolError is an intentional message to the LLM. Neither needs a traceback at logger.exception level.

longcw requested a review from theomonnom May 13, 2026 10:23

longcw changed the title ~~async tools work outside AsyncToolset via activity-scoped manager~~ make async tools first-class: direct registration, survives pause/resume May 14, 2026

This comment was marked as resolved.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make async tools first-class: direct registration, survives pause/resume#5711

make async tools first-class: direct registration, survives pause/resume#5711
longcw wants to merge 9 commits into
mainfrom
longc/async-task-manager

longcw commented May 12, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

longcw commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Backward compatibility

Example

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

longcw commented May 12, 2026 •

edited

Loading