Replies: 1 comment
-
|
could even do it without official tool calling (ie pull out shell calls) which I have found works very quickly for small models (obviously not great for large contexts), but not having tool call schema/output requirements can speed things up more |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Overview
Add a fast-path execution mode that races a lightweight model against the full agent for simple queries in autonomous mode. When the fast model can handle a request, users get near-instant responses without waiting for the full agent machinery.
Motivation
ls,cat file.txt, "hello", etc.Proposed Behavior
When Fast Path Activates
complete_fast()provider method (already exists)Fast Path Rules
<<PASS>>"<tool calls omitted>placeholders<<PASS>>→ hand off to full agentRacing Logic
Cost/Performance Profile
If the fast model wins, we can cancel the slow request. We would still be charged for the input tokens, but not the output tokens, which are typically typically 5x more expensive. If the fast model loses, we are charged for the input tokens. The input tokens for a fast model are 25x cheaper than the output tokens of a slow model. With the reduced number of input tokens we should be able to make interactions with goose both faster and cheaper.
Beta Was this translation helpful? Give feedback.
All reactions