anthropic client + interleaved thinking support by mikasenghaas · Pull Request #788 · PrimeIntellect-ai/verifiers

mikasenghaas · 2026-01-26T21:01:42Z

Description

Based on #774. Do not merge before.

This PR adds support for Anthropic clients and interleaved thinking

Accepts AsyncOpenAI | AsyncAnthropic in public-facing environment methods
Internally we wrap those native clients with vf.Client which maps between a normalized message, response, and tool types and the client-native types
New environment setter Environment.set_interleaved_thinking which passes the thinking flag to the client (e.g. OAI clients do a best-effort parsing of reasoning content)

Tested Features

Single-turn + OAI completions

uv run vf-eval continuation-quality -n1 -r1 -d -v

Single-turn + Anthropic completions (didn't test bc didn't find a Anthropic model with completions compatiblity)

uv run vf-eval continuation-quality -n1 -r1 -d -v -m haiku

Single-turn + OAI chat completions

uv run vf-eval gsm8k -n1 -r1 -d -v

Single-turn + Anthropic completions

uv run vf-eval gsm8k -n1 -r1 -d -v -m haiku

Multi-turn + OAI chat completions

uv run vf-eval alphabet-sort -n1 -r1 -d -v

Multi-turn + Anthropic messages

uv run vf-eval alphabet-sort -n1 -r1 -d -v -m haiku

Multi-turn + Tool + OAI chat completion

uv run vf-eval wiki-search -n1 -r1 -d -v

Multi-turn + Tool + Anthropic responses

uv run vf-eval wiki-search -n1 -r1 -d -v -m haiku

Multi-turn + OAI chat completions tokens (PRIME-RL vLLM server)

uv run vf-eval alphabet-sort -n1 -r1 -v -d -m Qwen/Qwen3-4B-Instruct-2507 -b http://localhost:8000/v1

Multi-turn + OAI chat completions tokens (PRIME-RL vLLM server)

uv run vf-eval alphabet-sort -n1 -r1 -v -d -m Qwen/Qwen3-4B-Instruct-2507 -b http://localhost:8000/v1 -x '{"interleaved_rollouts": true}'

Multi-turn + Tool + Interleaved Thinking + OAI chat completions

uv run vf-eval math-python -n1 -r1 -v -d -m deepseek-reasoner

Client Interface

The vf.Client wraps a native client and asks it to implement normalizers to enable a standard way to obtain a Response. In particular, the following four methods have to be implemented

Convert the vf.Messages prompt to a client-native prompts (to_native_prompt)
Get a client-native response (get_native_response)
(Optionally) Raise and handle exceptions (e.g. overlong prompt) for client-native exception (raise_from_native_response)
Convert the client-native response to a vf.Response (from_native_response)

In principle, we could continue to use the OAI-types as our normalizing type system (this would be less breaking), however there are some features from other clients that are difficult/ impossible to bake into the OAI types, and so it seems natural to define a custom type which supports the union of all features.

So far, we implement the following clients:

OpenAICompletionClient
OpenAIChatCompletionClient
OpenAIChatCompletionTokenClient
AnthropicCompletionClient
AnthropicMessagesClient

This list should be easily extensible to more client and API types in the future, such as OAI responses.

Backwards compatibility

The public-facing environment API is fully backwards compatible (client types are extended, oai_tools is converted to tool_defs wherever possible
We use a CustomBaseModel which allows dict-like access as to not break environments which return messages (e.g. BFCL v3)

Brealing:

oai_tools is deprecated. Environments which oai_tools in setup_state break silently becaus the environment doesn't read from oai_tools anymore.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Test improvement

Testing

All existing tests pass when running uv run pytest locally.
New tests have been added to cover the changes

Checklist

My code follows the style guidelines of this project as outlined in AGENTS.md
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published

Additional Notes

mikasenghaas · 2026-02-14T11:31:21Z

merged in #897

mikasenghaas added 6 commits January 26, 2026 16:34

add generic client type

ead6b7c

update setup_client function

82ed8f9

integrate with eval

8d82aa9

ckpt

8c35c6b

style

5c3b78b

fixes

df5bb8e

mikasenghaas changed the base branch from main to overhaul-results-saving January 26, 2026 21:01

mikasenghaas added 13 commits January 26, 2026 21:04

revert unrelated changes

8eff0dd

unified message types

270df02

unified response types

5a4d5c6

clean up naming

4e55b9f

make anthropic client work

6be0806

auto-resolve client type

fb39821

polish

1095c62

custom tool defs

ef6ad86

fixes for tc

577a116

anthropic tool call support

90f2e47

implement interleaved thinking

8f410ad

make interleaved thinking configurable

4eec0de

cleanup

62de5ad

mikasenghaas changed the title ~~anthropic client support~~ anthropic client + interleaved thinking support Jan 27, 2026

mikasenghaas added 5 commits January 27, 2026 16:39

clearer types

7ceada7

split clients and support tokens client

94f5ace

cleanup

46b302f

ensure backwards compatible

dd2afdd

logging style

192f04a

mikasenghaas requested a review from willccbb January 28, 2026 11:13

eligotts mentioned this pull request Feb 8, 2026

abstract client + support for interleaved thinking #860

Closed

21 tasks

mikasenghaas mentioned this pull request Feb 11, 2026

unified client interface #897

Merged

22 tasks

mikasenghaas closed this Feb 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

anthropic client + interleaved thinking support #788

anthropic client + interleaved thinking support #788
mikasenghaas wants to merge 24 commits intooverhaul-results-savingfrom
anthropic-client

mikasenghaas commented Jan 26, 2026 •

edited by willccbb

Loading

Uh oh!

mikasenghaas commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mikasenghaas commented Jan 26, 2026 • edited by willccbb Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tested Features

Client Interface

Backwards compatibility

Type of Change

Testing

Checklist

Additional Notes

Uh oh!

mikasenghaas commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mikasenghaas commented Jan 26, 2026 •

edited by willccbb

Loading