Skip to content

Add thread_with_system template vars#200

Merged
Alex R (alexr17) merged 2 commits into
mainfrom
codex-thread-with-system-autoevals
Jun 10, 2026
Merged

Add thread_with_system template vars#200
Alex R (alexr17) merged 2 commits into
mainfrom
codex-thread-with-system-autoevals

Conversation

@alexr17

@alexr17 Alex R (alexr17) commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Summary

  • make thread_with_system available anywhere prompt templates already support thread
  • let scorer templates use {{thread_with_system}} when they need the full conversation, including system messages

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown

Braintrust eval report

Autoevals (HEAD-1781058054)

Score Average Improvements Regressions
NumericDiff 78.8% (+0pp) 8 🟢 5 🔴
Time_to_first_token 8.9tok (+0.03tok) 103 🟢 108 🔴
Llm_calls 1.55 (+0) - -
Tool_calls 0 (+0) - -
Errors 0 (+0) - 1 🔴
Llm_errors 0 (+0) - -
Tool_errors 0 (+0) - -
Prompt_tokens 526.68tok (+17.83tok) 1 🟢 7 🔴
Prompt_cached_tokens 0tok (+0tok) - -
Prompt_cache_creation_tokens 0tok (+0tok) - -
Prompt_cache_creation_5m_tokens 0tok (+0tok) - -
Prompt_cache_creation_1h_tokens 0tok (+0tok) - -
Completion_tokens 465.82tok (+16.64tok) 86 🟢 121 🔴
Completion_reasoning_tokens 353.75tok (+15.42tok) 76 🟢 104 🔴
Completion_accepted_prediction_tokens 0tok (+0tok) - -
Completion_rejected_prediction_tokens 0tok (+0tok) - -
Completion_audio_tokens 0tok (+0tok) - -
Total_tokens 992.5tok (+34.47tok) 86 🟢 121 🔴
Estimated_cost 0$ (+0$) 77 🟢 103 🔴
Duration 8.92s (+0.19s) 103 🟢 113 🔴
Llm_duration 9.61s (+0.19s) 103 🟢 112 🔴

@alexr17 Alex R (alexr17) marked this pull request as ready for review June 10, 2026 02:21

@ankrgyl Ankur Goyal (ankrgyl) left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. and actually furthers the back-compat case because autoevals was filtering system messages in {{thread}}.

@alexr17 Alex R (alexr17) merged commit e541f54 into main Jun 10, 2026
15 checks passed
@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown

Braintrust eval report

Autoevals (main-1781074744)

Score Average Improvements Regressions
NumericDiff 77.9% (-1pp) 4 🟢 7 🔴
Time_to_first_token 10.77tok (+1.87tok) 36 🟢 181 🔴
Llm_calls 1.55 (+0) - -
Tool_calls 0 (+0) - -
Errors 0 (+0) 1 🟢 1 🔴
Llm_errors 0 (+0) - -
Tool_errors 0 (+0) - -
Prompt_tokens 527.21tok (+0.54tok) 1 🟢 1 🔴
Prompt_cached_tokens 0tok (+0tok) - -
Prompt_cache_creation_tokens 0tok (+0tok) - -
Prompt_cache_creation_5m_tokens 0tok (+0tok) - -
Prompt_cache_creation_1h_tokens 0tok (+0tok) - -
Completion_tokens 464.9tok (-0.92tok) 104 🟢 95 🔴
Completion_reasoning_tokens 354.33tok (+0.58tok) 87 🟢 89 🔴
Completion_accepted_prediction_tokens 0tok (+0tok) - -
Completion_rejected_prediction_tokens 0tok (+0tok) - -
Completion_audio_tokens 0tok (+0tok) - -
Total_tokens 992.11tok (-0.39tok) 104 🟢 95 🔴
Estimated_cost 0$ (0$) 93 🟢 85 🔴
Duration 10.79s (+1.87s) 37 🟢 182 🔴
Llm_duration 11.75s (+2.14s) 29 🟢 189 🔴

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants