Skip to content

gracefully fall back when model chat template doesn't support tools#771

Open
ericcurtin wants to merge 1 commit intomainfrom
add-web-search
Open

gracefully fall back when model chat template doesn't support tools#771
ericcurtin wants to merge 1 commit intomainfrom
add-web-search

Conversation

@ericcurtin
Copy link
Contributor

Models like gemma3 use Jinja chat templates that only support alternating user/assistant roles and cannot handle the tool/assistant(tool_calls) message roles introduced by the agentic web-search loop.

When a 500 response containing "Jinja" is detected, the client now resets the message history to the original user prompt, clears the tool schemas, and retries — allowing the model to answer from training data rather than erroring out. This covers two failure modes:

  1. The model rejects the request immediately because the tools parameter causes its template to inject an incompatible system message.
  2. The model successfully emits a tool_call but then fails on the follow-up request because it can't process the resulting "tool" role message.

Adds three unit tests covering the fallback paths and the non-Jinja error case.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces robust tool-calling capabilities and a graceful fallback mechanism for models that do not support tool-specific chat templates, such as those using Jinja. The changes include new data structures for OpenAI-compatible tool definitions and calls, an agentic loop in the ChatWithMessagesContext function to handle tool execution, and a retry mechanism that falls back to a non-tool-enabled chat if a Jinja template error is detected. Additionally, a WebSearchTool is implemented, demonstrating how external services can be integrated as tools. Comprehensive unit tests have been added to validate the fallback logic under various error conditions, ensuring the stability and reliability of the new features.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location path="cmd/cli/tools/websearch.go" line_range="102-109" />
<code_context>
+		return "", fmt.Errorf("reading response: %w", err)
+	}
+
+	// Response may be SSE or plain JSON — handle both.
+	responseText := string(body)
+	for _, line := range strings.Split(responseText, "\n") {
+		if strings.HasPrefix(line, "data: ") {
+			responseText = strings.TrimPrefix(line, "data: ")
</code_context>
<issue_to_address>
**suggestion (bug_risk):** SSE handling only uses the first `data:` line, which can truncate multi-chunk responses.

This loop stops at the first `data:` line and discards any subsequent chunks, so multi-part SSE messages will be parsed using only the first fragment, causing malformed JSON or truncated content. Consider accumulating all `data:` lines into a single buffer (e.g., joined with `
`) before unmarshaling, or using a proper SSE parser for `text/event-stream` responses.

```suggestion
	// Response may be SSE or plain JSON — handle both.
	responseText := string(body)

	// For SSE (`text/event-stream`), accumulate all `data:` lines into a single payload.
	var dataLines []string
	for _, line := range strings.Split(responseText, "\n") {
		if strings.HasPrefix(line, "data: ") {
			dataLines = append(dataLines, strings.TrimPrefix(line, "data: "))
		}
	}
	if len(dataLines) > 0 {
		responseText = strings.Join(dataLines, "\n")
	}
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Models like gemma3 use Jinja chat templates that only support alternating
user/assistant roles and cannot handle the tool/assistant(tool_calls) message
roles introduced by the agentic web-search loop.

When a 500 response containing "Jinja" is detected, the client now resets the
message history to the original user prompt, clears the tool schemas, and
retries — allowing the model to answer from training data rather than erroring
out. This covers two failure modes:

  1. The model rejects the request immediately because the tools parameter
     causes its template to inject an incompatible system message.
  2. The model successfully emits a tool_call but then fails on the follow-up
     request because it can't process the resulting "tool" role message.

Adds three unit tests covering the fallback paths and the non-Jinja error case.

Signed-off-by: Eric Curtin <eric.curtin@docker.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant