Skip to content

fix(model_runner): request token usage on streaming completions#248

Merged
frostming merged 5 commits into
bubbuild:mainfrom
CorrectRoadH:fix/streaming-usage-include-usage
Jun 30, 2026
Merged

fix(model_runner): request token usage on streaming completions#248
frostming merged 5 commits into
bubbuild:mainfrom
CorrectRoadH:fix/streaming-usage-include-usage

Conversation

@CorrectRoadH

Copy link
Copy Markdown
Contributor

Problem

Streamed runs against OpenAI-compatible providers record zero tokens and zero cost in the tape.

The run event's usage ends up empty, so any consumer that reads usage from the tape (cost reporting, the OTel tapestore's gen_ai.usage.* attributes, etc.) sees 0.

Root cause

ModelRunner.completion_response calls acompletion(..., stream=True) but never sets stream_options.include_usage. Per the OpenAI streaming spec, a streaming chat-completion omits the usage block unless stream_options.include_usage is explicitly requested. As a result Tape._extract_usage(chunk) never finds usage on any chunk and state.usage stays unset.

This is reproducible directly against an OpenAI-compatible endpoint:

stream=True, no stream_options          -> no usage in any chunk
stream=True, include_usage=True         -> final chunk carries {prompt_tokens, completion_tokens, total_tokens}

Fix

Pass stream_options={"include_usage": True} when streaming, gated on the provider base class so only OpenAI-compatible providers receive it:

if stream and isinstance(llm, BaseOpenAIProvider):
    return {"include_usage": True}
  • stream_options.include_usage is an OpenAI-completions field; BaseOpenAIProvider subclasses (openai, openrouter, …) accept it, while anthropic/gemini reject it — the gate leaves those paths untouched.
  • Non-streaming completions already carry usage in the response body and are unaffected.
  • The streaming consumer already extracts usage from chunks (Tape._extract_usage); it simply never had usage to read until now.

Tests

Adds unit tests covering the gate:

  • streaming OpenAI / OpenAI-compatible provider → {"include_usage": True}
  • non-streaming → None
  • non-OpenAI provider (anthropic) → None

Streaming chat completions from OpenAI-compatible providers omit the
`usage` block unless the request sets `stream_options.include_usage`.
Because the streaming call never set it, `Tape._extract_usage` found no
usage on any chunk, so every streamed run recorded zero tokens and zero
cost in the tape.

Pass `stream_options={"include_usage": True}` when streaming, gated on the
provider base class so only OpenAI-compatible providers (which accept the
field) receive it — anthropic/gemini reject it and are left untouched.
Non-streaming completions already carry usage in the response body and are
unaffected.

Adds unit tests covering the gate: streaming OpenAI-compatible providers
get the field; non-streaming and non-OpenAI providers do not.
Collapse four near-duplicate tests into a single parametrized test that
guards the non-obvious intent (non-OpenAI providers must not receive
stream_options) and the primary path, dropping the redundant cases.
The gate is a two-line pure helper; a dedicated test mostly mirrors its
implementation rather than guarding real behavior.
CorrectRoadH added a commit to CorrectRoadH/bub-contrib that referenced this pull request Jun 29, 2026
Temporary: aligns the workspace bub source with the fork that carries the
streaming stream_options.include_usage fix (bubbuild/bub#248), so installs
that pull both this plugin and the fixed bub don't hit uv's conflicting-URL
error. Revert to bubbuild/bub once #248 is released.
@yihong0618

Copy link
Copy Markdown
Collaborator

wow nice and simple can you also add a test?

CorrectRoadH added a commit to CorrectRoadH/coding-agent-memory-evals that referenced this pull request Jun 29, 2026
bub's streaming completions never set stream_options.include_usage, so
OpenAI-compatible providers returned no usage block and every bub run
recorded zero tokens / zero cost in the tape. Fixed upstream in a bub fork
(bubbuild/bub#248); this wires the eval to that fork.

- Install bub by name with a uv `--overrides` file pinning it to the fork
  branch. Installing the fork as a top-level git URL conflicts with the
  otel plugin's own bub dependency ("conflicting URLs for package bub",
  even when identical); the override unifies every bub requirement to one
  source, so bub-contrib needs no change.
- Derive the disk-checkpoint filename from a hash of the full install spec
  so changing the bub/plugin source auto-invalidates the cache.
- Correct the stale comment that claimed bub never records usage.

Verified end-to-end: real non-zero usage (138.7k in / 3.9k out / 121.9k
cache-read) and otel traces both flow; cost is derived from the pricing
table when a model is given. Revert to 'bub>=0.3.0a1' once #248 ships.
@CorrectRoadH

Copy link
Copy Markdown
Contributor Author

@yihong0618
Earlier, when AI helped me fix this issue, it also wrote a test, but I felt that version wasn’t great. It felt a bit like the test was just asserting its own setup, so I removed it.

After seeing your comment, I asked to look at the style of other tests in this repo and try to build a more e2e-style regression test. The idea was to verify not only that stream_options is passed, but also that streaming usage eventually gets recorded into the tape. That said, this version now feels a bit heavy, and I’m not sure whether this is actually the best practice for this repo.

@CorrectRoadH

Copy link
Copy Markdown
Contributor Author

Or did you mean the successful screenshot proof that the bug exists and works after the fix, rather than an actual test file? Or anything🤔

I found this Bub issue in a eval case while building an agent eval framework. On the Bub version with this PR’s fix, it already works successfully.

@yihong0618

Copy link
Copy Markdown
Collaborator

Or did you mean the successful screenshot proof that the bug exists and works after the fix, rather than an actual test file? Or anything🤔

I found this Bub issue in a eval case while building an agent eval framework. On the Bub version with this PR’s fix, it already works successfully.

for my side both ok just simple test also fine~
can you help fix hte linit?

@CorrectRoadH

Copy link
Copy Markdown
Contributor Author

I see the error of CI. I will fix it

@CorrectRoadH CorrectRoadH marked this pull request as draft June 30, 2026 05:43
@CorrectRoadH CorrectRoadH marked this pull request as ready for review June 30, 2026 05:43
@frostming frostming merged commit c4e09e3 into bubbuild:main Jun 30, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants