Skip to content

🤖 tests: lock alias-based OpenAI reasoning summary gating#3198

Open
ibetitsmike wants to merge 1 commit intomainfrom
mike/treat-as-thinking-levels
Open

🤖 tests: lock alias-based OpenAI reasoning summary gating#3198
ibetitsmike wants to merge 1 commit intomainfrom
mike/treat-as-thinking-levels

Conversation

@ibetitsmike
Copy link
Copy Markdown
Contributor

@ibetitsmike ibetitsmike commented Apr 27, 2026

Mux is acting on Mike's behalf.

Summary

Add a regression test that locks OpenAI reasoning summary compatibility to the mapped alias target, so aliases that Treat as gpt-5.3-codex-spark continue to omit reasoningSummary while preserving encrypted reasoning content.

Background

The previous PR draft described a revert that did not exist and used a test that did not actually depend on alias resolution. This update narrows the PR to the OpenAI behavior that mappedToModel really changes today.

Implementation

Added a buildProviderOptions() regression test for openai:custom-spark mapped to openai:gpt-5.3-codex-spark. The test verifies the mapped model drives the reasoningSummary gate while leaving the rest of the OpenAI reasoning payload intact. There are no production code changes in this PR.

Validation

  • bun test src/common/utils/ai/providerOptions.test.ts
  • make static-check

Risks

Low. Test-only change that documents existing alias resolution behavior.


Generated with mux • Model: openai:gpt-5.4 • Thinking: xhigh • Cost: $7.53

@ibetitsmike
Copy link
Copy Markdown
Contributor Author

/coder-agents-review

@ibetitsmike
Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Bravo.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ibetitsmike
Copy link
Copy Markdown
Contributor Author

/coder-agents-review

Copy link
Copy Markdown

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent is good: locking down Treat-as reasoning inheritance with a comment and regression test. The test scaffolding itself is well-constructed, using existing helpers and asserting specific provider option fields.

4 P2, 2 P3 (1 P3 in body). The headline: the test cannot catch the regression it was built to prevent. Both assertions produce the same values with or without the Treat-as mapping. A deeper thread surfaced during review: enforceThinkingPolicy resolves thinking levels from the raw model regex, not through Treat-as, so xhigh is unreachable for truly custom models even when capability resolution is correct.

The commit uses fix: but the production diff is comment-only. P3 [DEREM-5] Should be test: or docs: to match what actually changed. Git log, changelogs, and semver tooling will record this as a bugfix that never was. (Mafu-san P2, Leorio P3)

"This is costume jewelry. The test executes the Treat as plumbing and asserts values that would be the same without it." Bisky

🤖 This review was automatically generated with Coder Agents.

Comment thread src/common/utils/ai/providerOptions.ts Outdated
Comment thread src/common/utils/ai/providerOptions.ts Outdated
Comment thread src/common/utils/ai/providerOptions.ts Outdated
Comment thread src/common/utils/ai/providerOptions.test.ts Outdated
Comment thread src/common/utils/ai/providerOptions.test.ts Outdated
Copy link
Copy Markdown

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 2 is blocked. No new commits were pushed and none of the 6 open findings received an author response.

Unaddressed findings (4 P2, 2 P3):

  • DEREM-1 P2: PR description claims a code revert that does not exist in the diff.
  • DEREM-2 P2: Test does not exercise the Treat-as mapping; both assertions are model-independent.
  • DEREM-3 P2: Comment misleads about what capModelName controls in the OpenAI path.
  • DEREM-4 P2: enforceThinkingPolicy resolves via raw model regex, not Treat-as; xhigh is unreachable for custom model names.
  • DEREM-5 P3: Commit type fix: for a comment-only change.
  • DEREM-6 P3: Test name uses inconsistent terminology and reverses the mapping direction.

Further review is blocked until the author responds to or pushes fixes for these findings.

🤖 This review was automatically generated with Coder Agents.

@ibetitsmike ibetitsmike force-pushed the mike/treat-as-thinking-levels branch from 00665ea to 622094e Compare April 27, 2026 17:23
@ibetitsmike ibetitsmike changed the title 🤖 tests: lock Treat as reasoning inheritance 🤖 tests: lock alias-based OpenAI reasoning summary gating Apr 27, 2026
@ibetitsmike
Copy link
Copy Markdown
Contributor Author

/coder-agents-review

@ibetitsmike
Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Breezy!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All 6 findings from R1 addressed. The PR was rewritten from a mixed production+test change with a false revert claim into a clean, minimal test-only PR that does what it says.

The new test is genuine: openai:custom-spark mapped to openai:gpt-5.3-codex-spark via providersConfig, asserting reasoningSummary is undefined. If alias resolution broke, capModelName would be "custom-spark" (not in the blocklist), reasoningSummary would be "detailed", and the assertion would fail. 11 of 12 panel reviewers independently verified this.

One architectural note (not actionable here): enforceThinkingPolicy resolves thinking levels from the raw model regex, not through alias resolution. A custom model requesting xhigh gets silently clamped to high before buildProviderOptions runs. This is a pre-existing gap in policy.ts, outside this PR's scope. The test correctly sidesteps it by using "medium". Worth a separate ticket if alias-based thinking policy matters.

"I tried to build a case against this and couldn't. The problem is correctly understood, the solution is proportional, and the fix is at the right level." Pariston

🤖 This review was automatically generated with Coder Agents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant