fix(model_runner): request token usage on streaming completions by CorrectRoadH · Pull Request #248 · bubbuild/bub

CorrectRoadH · 2026-06-29T03:48:20Z

Problem

Streamed runs against OpenAI-compatible providers record zero tokens and zero cost in the tape.

The run event's usage ends up empty, so any consumer that reads usage from the tape (cost reporting, the OTel tapestore's gen_ai.usage.* attributes, etc.) sees 0.

Root cause

ModelRunner.completion_response calls acompletion(..., stream=True) but never sets stream_options.include_usage. Per the OpenAI streaming spec, a streaming chat-completion omits the usage block unless stream_options.include_usage is explicitly requested. As a result Tape._extract_usage(chunk) never finds usage on any chunk and state.usage stays unset.

This is reproducible directly against an OpenAI-compatible endpoint:

stream=True, no stream_options          -> no usage in any chunk
stream=True, include_usage=True         -> final chunk carries {prompt_tokens, completion_tokens, total_tokens}

Fix

Pass stream_options={"include_usage": True} when streaming, gated on the provider base class so only OpenAI-compatible providers receive it:

if stream and isinstance(llm, BaseOpenAIProvider):
    return {"include_usage": True}

stream_options.include_usage is an OpenAI-completions field; BaseOpenAIProvider subclasses (openai, openrouter, …) accept it, while anthropic/gemini reject it — the gate leaves those paths untouched.
Non-streaming completions already carry usage in the response body and are unaffected.
The streaming consumer already extracts usage from chunks (Tape._extract_usage); it simply never had usage to read until now.

Tests

Adds unit tests covering the gate:

streaming OpenAI / OpenAI-compatible provider → {"include_usage": True}
non-streaming → None
non-OpenAI provider (anthropic) → None

Streaming chat completions from OpenAI-compatible providers omit the `usage` block unless the request sets `stream_options.include_usage`. Because the streaming call never set it, `Tape._extract_usage` found no usage on any chunk, so every streamed run recorded zero tokens and zero cost in the tape. Pass `stream_options={"include_usage": True}` when streaming, gated on the provider base class so only OpenAI-compatible providers (which accept the field) receive it — anthropic/gemini reject it and are left untouched. Non-streaming completions already carry usage in the response body and are unaffected. Adds unit tests covering the gate: streaming OpenAI-compatible providers get the field; non-streaming and non-OpenAI providers do not.

Collapse four near-duplicate tests into a single parametrized test that guards the non-obvious intent (non-OpenAI providers must not receive stream_options) and the primary path, dropping the redundant cases.

The gate is a two-line pure helper; a dedicated test mostly mirrors its implementation rather than guarding real behavior.

Temporary: aligns the workspace bub source with the fork that carries the streaming stream_options.include_usage fix (bubbuild/bub#248), so installs that pull both this plugin and the fixed bub don't hit uv's conflicting-URL error. Revert to bubbuild/bub once #248 is released.

yihong0618 · 2026-06-29T05:38:39Z

wow nice and simple can you also add a test?

bub's streaming completions never set stream_options.include_usage, so OpenAI-compatible providers returned no usage block and every bub run recorded zero tokens / zero cost in the tape. Fixed upstream in a bub fork (bubbuild/bub#248); this wires the eval to that fork. - Install bub by name with a uv `--overrides` file pinning it to the fork branch. Installing the fork as a top-level git URL conflicts with the otel plugin's own bub dependency ("conflicting URLs for package bub", even when identical); the override unifies every bub requirement to one source, so bub-contrib needs no change. - Derive the disk-checkpoint filename from a hash of the full install spec so changing the bub/plugin source auto-invalidates the cache. - Correct the stale comment that claimed bub never records usage. Verified end-to-end: real non-zero usage (138.7k in / 3.9k out / 121.9k cache-read) and otel traces both flow; cost is derived from the pricing table when a model is given. Revert to 'bub>=0.3.0a1' once #248 ships.

CorrectRoadH · 2026-06-30T05:33:09Z

@yihong0618
Earlier, when AI helped me fix this issue, it also wrote a test, but I felt that version wasn’t great. It felt a bit like the test was just asserting its own setup, so I removed it.

After seeing your comment, I asked to look at the style of other tests in this repo and try to build a more e2e-style regression test. The idea was to verify not only that stream_options is passed, but also that streaming usage eventually gets recorded into the tape. That said, this version now feels a bit heavy, and I’m not sure whether this is actually the best practice for this repo.

CorrectRoadH · 2026-06-30T05:36:50Z

Or did you mean the successful screenshot proof that the bug exists and works after the fix, rather than an actual test file? Or anything🤔

I found this Bub issue in a eval case while building an agent eval framework. On the Bub version with this PR’s fix, it already works successfully.

yihong0618 · 2026-06-30T05:37:52Z

Or did you mean the successful screenshot proof that the bug exists and works after the fix, rather than an actual test file? Or anything🤔

I found this Bub issue in a eval case while building an agent eval framework. On the Bub version with this PR’s fix, it already works successfully.

for my side both ok just simple test also fine~
can you help fix hte linit?

CorrectRoadH · 2026-06-30T05:38:06Z

I see the error of CI. I will fix it

CorrectRoadH added 3 commits June 29, 2026 11:47

test: trim usage-gate tests to one parametrized case

e5560d8

Collapse four near-duplicate tests into a single parametrized test that guards the non-obvious intent (non-OpenAI providers must not receive stream_options) and the primary path, dropping the redundant cases.

test: drop usage-gate test

cddc95f

The gate is a two-line pure helper; a dedicated test mostly mirrors its implementation rather than guarding real behavior.

test: cover streaming usage tape recording

3bb26e9

test: format model runner test

6bf1953

CorrectRoadH marked this pull request as draft June 30, 2026 05:43

CorrectRoadH marked this pull request as ready for review June 30, 2026 05:43

frostming approved these changes Jun 30, 2026

View reviewed changes

frostming merged commit c4e09e3 into bubbuild:main Jun 30, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(model_runner): request token usage on streaming completions#248

fix(model_runner): request token usage on streaming completions#248
frostming merged 5 commits into
bubbuild:mainfrom
CorrectRoadH:fix/streaming-usage-include-usage

CorrectRoadH commented Jun 29, 2026

Uh oh!

yihong0618 commented Jun 29, 2026

Uh oh!

CorrectRoadH commented Jun 30, 2026

Uh oh!

CorrectRoadH commented Jun 30, 2026

Uh oh!

yihong0618 commented Jun 30, 2026

Uh oh!

CorrectRoadH commented Jun 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

CorrectRoadH commented Jun 29, 2026

Problem

Root cause

Fix

Tests

Uh oh!

yihong0618 commented Jun 29, 2026

Uh oh!

CorrectRoadH commented Jun 30, 2026

Uh oh!

CorrectRoadH commented Jun 30, 2026

Uh oh!

yihong0618 commented Jun 30, 2026

Uh oh!

CorrectRoadH commented Jun 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants