Skip to content

anthropic client + interleaved thinking support #788

Closed
mikasenghaas wants to merge 24 commits intooverhaul-results-savingfrom
anthropic-client
Closed

anthropic client + interleaved thinking support #788
mikasenghaas wants to merge 24 commits intooverhaul-results-savingfrom
anthropic-client

Conversation

@mikasenghaas
Copy link
Copy Markdown
Member

@mikasenghaas mikasenghaas commented Jan 26, 2026

Description

Based on #774. Do not merge before.

This PR adds support for Anthropic clients and interleaved thinking

  • Accepts AsyncOpenAI | AsyncAnthropic in public-facing environment methods
  • Internally we wrap those native clients with vf.Client which maps between a normalized message, response, and tool types and the client-native types
  • New environment setter Environment.set_interleaved_thinking which passes the thinking flag to the client (e.g. OAI clients do a best-effort parsing of reasoning content)

Tested Features

  • Single-turn + OAI completions
uv run vf-eval continuation-quality -n1 -r1 -d -v 
  • Single-turn + Anthropic completions (didn't test bc didn't find a Anthropic model with completions compatiblity)
uv run vf-eval continuation-quality -n1 -r1 -d -v -m haiku
  • Single-turn + OAI chat completions
uv run vf-eval gsm8k -n1 -r1 -d -v 
  • Single-turn + Anthropic completions
uv run vf-eval gsm8k -n1 -r1 -d -v -m haiku
  • Multi-turn + OAI chat completions
uv run vf-eval alphabet-sort -n1 -r1 -d -v
  • Multi-turn + Anthropic messages
uv run vf-eval alphabet-sort -n1 -r1 -d -v -m haiku
  • Multi-turn + Tool + OAI chat completion
uv run vf-eval wiki-search -n1 -r1 -d -v
  • Multi-turn + Tool + Anthropic responses
uv run vf-eval wiki-search -n1 -r1 -d -v -m haiku
  • Multi-turn + OAI chat completions tokens (PRIME-RL vLLM server)
uv run vf-eval alphabet-sort -n1 -r1 -v -d -m Qwen/Qwen3-4B-Instruct-2507 -b http://localhost:8000/v1
  • Multi-turn + OAI chat completions tokens (PRIME-RL vLLM server)
uv run vf-eval alphabet-sort -n1 -r1 -v -d -m Qwen/Qwen3-4B-Instruct-2507 -b http://localhost:8000/v1 -x '{"interleaved_rollouts": true}'
  • Multi-turn + Tool + Interleaved Thinking + OAI chat completions
uv run vf-eval math-python -n1 -r1 -v -d -m deepseek-reasoner

Client Interface

The vf.Client wraps a native client and asks it to implement normalizers to enable a standard way to obtain a Response. In particular, the following four methods have to be implemented

  1. Convert the vf.Messages prompt to a client-native prompts (to_native_prompt)
  2. Get a client-native response (get_native_response)
  3. (Optionally) Raise and handle exceptions (e.g. overlong prompt) for client-native exception (raise_from_native_response)
  4. Convert the client-native response to a vf.Response (from_native_response)

In principle, we could continue to use the OAI-types as our normalizing type system (this would be less breaking), however there are some features from other clients that are difficult/ impossible to bake into the OAI types, and so it seems natural to define a custom type which supports the union of all features.

So far, we implement the following clients:

  • OpenAICompletionClient
  • OpenAIChatCompletionClient
  • OpenAIChatCompletionTokenClient
  • AnthropicCompletionClient
  • AnthropicMessagesClient

This list should be easily extensible to more client and API types in the future, such as OAI responses.

Backwards compatibility

  • The public-facing environment API is fully backwards compatible (client types are extended, oai_tools is converted to tool_defs wherever possible
  • We use a CustomBaseModel which allows dict-like access as to not break environments which return messages (e.g. BFCL v3)

Brealing:

  • oai_tools is deprecated. Environments which oai_tools in setup_state break silently becaus the environment doesn't read from oai_tools anymore.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Additional Notes

@mikasenghaas mikasenghaas changed the base branch from main to overhaul-results-saving January 26, 2026 21:01
@mikasenghaas mikasenghaas changed the title anthropic client support anthropic client + interleaved thinking support Jan 27, 2026
@mikasenghaas
Copy link
Copy Markdown
Member Author

merged in #897

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant