fix(model_runner): request token usage on streaming completions#248
Conversation
Streaming chat completions from OpenAI-compatible providers omit the
`usage` block unless the request sets `stream_options.include_usage`.
Because the streaming call never set it, `Tape._extract_usage` found no
usage on any chunk, so every streamed run recorded zero tokens and zero
cost in the tape.
Pass `stream_options={"include_usage": True}` when streaming, gated on the
provider base class so only OpenAI-compatible providers (which accept the
field) receive it — anthropic/gemini reject it and are left untouched.
Non-streaming completions already carry usage in the response body and are
unaffected.
Adds unit tests covering the gate: streaming OpenAI-compatible providers
get the field; non-streaming and non-OpenAI providers do not.
Collapse four near-duplicate tests into a single parametrized test that guards the non-obvious intent (non-OpenAI providers must not receive stream_options) and the primary path, dropping the redundant cases.
The gate is a two-line pure helper; a dedicated test mostly mirrors its implementation rather than guarding real behavior.
Temporary: aligns the workspace bub source with the fork that carries the streaming stream_options.include_usage fix (bubbuild/bub#248), so installs that pull both this plugin and the fixed bub don't hit uv's conflicting-URL error. Revert to bubbuild/bub once #248 is released.
|
wow nice and simple can you also add a test? |
bub's streaming completions never set stream_options.include_usage, so OpenAI-compatible providers returned no usage block and every bub run recorded zero tokens / zero cost in the tape. Fixed upstream in a bub fork (bubbuild/bub#248); this wires the eval to that fork. - Install bub by name with a uv `--overrides` file pinning it to the fork branch. Installing the fork as a top-level git URL conflicts with the otel plugin's own bub dependency ("conflicting URLs for package bub", even when identical); the override unifies every bub requirement to one source, so bub-contrib needs no change. - Derive the disk-checkpoint filename from a hash of the full install spec so changing the bub/plugin source auto-invalidates the cache. - Correct the stale comment that claimed bub never records usage. Verified end-to-end: real non-zero usage (138.7k in / 3.9k out / 121.9k cache-read) and otel traces both flow; cost is derived from the pricing table when a model is given. Revert to 'bub>=0.3.0a1' once #248 ships.
|
@yihong0618 After seeing your comment, I asked to look at the style of other tests in this repo and try to build a more e2e-style regression test. The idea was to verify not only that stream_options is passed, but also that streaming usage eventually gets recorded into the tape. That said, this version now feels a bit heavy, and I’m not sure whether this is actually the best practice for this repo. |
|
Or did you mean the successful screenshot proof that the bug exists and works after the fix, rather than an actual test file? Or anything🤔 I found this Bub issue in a eval case while building an agent eval framework. On the Bub version with this PR’s fix, it already works successfully. |
for my side both ok just simple test also fine~ |
|
I see the error of CI. I will fix it |
Problem
Streamed runs against OpenAI-compatible providers record zero tokens and zero cost in the tape.
The
runevent'susageends up empty, so any consumer that reads usage from the tape (cost reporting, the OTel tapestore'sgen_ai.usage.*attributes, etc.) sees0.Root cause
ModelRunner.completion_responsecallsacompletion(..., stream=True)but never setsstream_options.include_usage. Per the OpenAI streaming spec, a streaming chat-completion omits theusageblock unlessstream_options.include_usageis explicitly requested. As a resultTape._extract_usage(chunk)never finds usage on any chunk andstate.usagestays unset.This is reproducible directly against an OpenAI-compatible endpoint:
Fix
Pass
stream_options={"include_usage": True}when streaming, gated on the provider base class so only OpenAI-compatible providers receive it:stream_options.include_usageis an OpenAI-completions field;BaseOpenAIProvidersubclasses (openai, openrouter, …) accept it, while anthropic/gemini reject it — the gate leaves those paths untouched.usagein the response body and are unaffected.Tape._extract_usage); it simply never had usage to read until now.Tests
Adds unit tests covering the gate:
{"include_usage": True}NoneNone