core: cache the tool search handler per session#27258
Conversation
cc03149 to
46f0884
Compare
b8f1cab to
8b7acd6
Compare
| } | ||
| } | ||
|
|
||
| let handler = Arc::new(ToolSearchHandler::new(search_infos.clone())); |
There was a problem hiding this comment.
Can we use a fingerprint instead of storing a whole copy of the original search_infos?
There was a problem hiding this comment.
@mzeng-openai - To my understanding, you will drive broader tool caching (Slack). The broader approach may eliminate the need for this PR, unless it caches at multiple levels. What do you think though and what is your timeline? Are you thinking I proceed with this PR for initial improvement?
There was a problem hiding this comment.
I'll do some research and let you know. In the meanwhile this seems like a viable interim solution to unblock the launch since basically half of the latency comes from BM25 index. But would love to get thoughts from core agent folks
There was a problem hiding this comment.
Updated this so ToolSearchHandler owns the original ordered search_infos, and the cache compares directly against the handler's actual inputs. This removes the search_infos copy and avoids a separate derived cache key. Still open to the fingerprint approach if others feel strongly though.
16a6ece to
95dbc37
Compare
4cd9e95 to
c84f5f5
Compare
c84f5f5 to
3e503cc
Compare
Why
Tool router construction rebuilds the deferred-tool BM25 index during session initialization and before each sampling continuation, even when the searchable tool metadata is unchanged. Local profiling measured
append_tool_search_executorat roughly 113 ms per continuation, making repeated index construction the largest measured router-building cost.What changed
ToolSearchHandlerCacheso continuations and user turns can reuse the existing handler.Vec<ToolSearchInfo>, rebuilding when searchable text, loadable tool specs, source metadata, or ordering changes.Verification
cache_reuses_identical_search_infos_and_rebuilds_changed_inputscovers exact cache reuse and invalidation when the ordered search metadata changes.