🤖 tests: lock alias-based OpenAI reasoning summary gating#3198
🤖 tests: lock alias-based OpenAI reasoning summary gating#3198ibetitsmike wants to merge 1 commit intomainfrom
Conversation
|
/coder-agents-review |
|
@codex review |
|
Codex Review: Didn't find any major issues. Bravo. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
/coder-agents-review |
There was a problem hiding this comment.
The intent is good: locking down Treat-as reasoning inheritance with a comment and regression test. The test scaffolding itself is well-constructed, using existing helpers and asserting specific provider option fields.
4 P2, 2 P3 (1 P3 in body). The headline: the test cannot catch the regression it was built to prevent. Both assertions produce the same values with or without the Treat-as mapping. A deeper thread surfaced during review: enforceThinkingPolicy resolves thinking levels from the raw model regex, not through Treat-as, so xhigh is unreachable for truly custom models even when capability resolution is correct.
The commit uses fix: but the production diff is comment-only. P3 [DEREM-5] Should be test: or docs: to match what actually changed. Git log, changelogs, and semver tooling will record this as a bugfix that never was. (Mafu-san P2, Leorio P3)
"This is costume jewelry. The test executes the Treat as plumbing and asserts values that would be the same without it." Bisky
🤖 This review was automatically generated with Coder Agents.
There was a problem hiding this comment.
Round 2 is blocked. No new commits were pushed and none of the 6 open findings received an author response.
Unaddressed findings (4 P2, 2 P3):
- DEREM-1 P2: PR description claims a code revert that does not exist in the diff.
- DEREM-2 P2: Test does not exercise the Treat-as mapping; both assertions are model-independent.
- DEREM-3 P2: Comment misleads about what
capModelNamecontrols in the OpenAI path. - DEREM-4 P2:
enforceThinkingPolicyresolves via raw model regex, not Treat-as; xhigh is unreachable for custom model names. - DEREM-5 P3: Commit type
fix:for a comment-only change. - DEREM-6 P3: Test name uses inconsistent terminology and reverses the mapping direction.
Further review is blocked until the author responds to or pushes fixes for these findings.
🤖 This review was automatically generated with Coder Agents.
00665ea to
622094e
Compare
|
/coder-agents-review |
|
@codex review |
|
Codex Review: Didn't find any major issues. Breezy! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
All 6 findings from R1 addressed. The PR was rewritten from a mixed production+test change with a false revert claim into a clean, minimal test-only PR that does what it says.
The new test is genuine: openai:custom-spark mapped to openai:gpt-5.3-codex-spark via providersConfig, asserting reasoningSummary is undefined. If alias resolution broke, capModelName would be "custom-spark" (not in the blocklist), reasoningSummary would be "detailed", and the assertion would fail. 11 of 12 panel reviewers independently verified this.
One architectural note (not actionable here): enforceThinkingPolicy resolves thinking levels from the raw model regex, not through alias resolution. A custom model requesting xhigh gets silently clamped to high before buildProviderOptions runs. This is a pre-existing gap in policy.ts, outside this PR's scope. The test correctly sidesteps it by using "medium". Worth a separate ticket if alias-based thinking policy matters.
"I tried to build a case against this and couldn't. The problem is correctly understood, the solution is proportional, and the fix is at the right level." Pariston
🤖 This review was automatically generated with Coder Agents.
Summary
Add a regression test that locks OpenAI reasoning summary compatibility to the mapped alias target, so aliases that Treat as
gpt-5.3-codex-sparkcontinue to omitreasoningSummarywhile preserving encrypted reasoning content.Background
The previous PR draft described a revert that did not exist and used a test that did not actually depend on alias resolution. This update narrows the PR to the OpenAI behavior that
mappedToModelreally changes today.Implementation
Added a
buildProviderOptions()regression test foropenai:custom-sparkmapped toopenai:gpt-5.3-codex-spark. The test verifies the mapped model drives thereasoningSummarygate while leaving the rest of the OpenAI reasoning payload intact. There are no production code changes in this PR.Validation
bun test src/common/utils/ai/providerOptions.test.tsmake static-checkRisks
Low. Test-only change that documents existing alias resolution behavior.
Generated with
mux• Model:openai:gpt-5.4• Thinking:xhigh• Cost:$7.53