Add token limit fields to ProviderConfig across all SDKs#966
Add token limit fields to ProviderConfig across all SDKs#966MackinnonBuck wants to merge 5 commits intomainfrom
Conversation
Add maxOutputTokens, maxPromptTokens, maxContextWindowTokens, and modelLimitsId to ProviderConfig in Node.js, Python, .NET, and Go SDKs. These optional fields allow BYOK users to configure token limits for custom providers, matching the runtime's ProviderConfig (PR #5311). Also update the Python wire format conversion to map the new snake_case fields to camelCase for the JSON-RPC wire protocol. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR adds four optional token-limit configuration fields to ProviderConfig across the Node.js, Python, Go, and .NET SDKs so BYOK/custom-provider users can override model token limits and/or specify an alternate model ID for capability-catalog limit lookup.
Changes:
- Added
maxOutputTokens,maxPromptTokens,maxContextWindowTokens, andmodelLimitsIdtoProviderConfigin Node.js, Go, and .NET. - Added corresponding snake_case fields to Python
ProviderConfigand mapped them to camelCase wire keys in the Python client. - Documented the behavior/precedence of these fields via inline comments/XML docs.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
nodejs/src/types.ts |
Extends TS ProviderConfig interface with optional token limit fields and modelLimitsId. |
python/copilot/session.py |
Extends Python ProviderConfig TypedDict with optional token-limit and model-limit lookup fields. |
python/copilot/client.py |
Maps new snake_case Python fields onto the expected camelCase wire names. |
go/types.go |
Extends Go ProviderConfig struct with new JSON fields (omitempty). |
dotnet/src/Types.cs |
Extends .NET ProviderConfig with nullable properties and JSON property names. |
Fixes Go naming to follow initialism convention (ID not Id), consistent with existing fields like APIKey, BaseURL, and APIVersion. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
✅ Cross-SDK Consistency ReviewThis PR maintains excellent consistency across all four SDK implementations. I've verified the following: ✅ Feature ParityAll four SDKs (Node.js/TypeScript, Python, Go, and .NET) now include the same four fields:
✅ Language Idiom ComplianceEach SDK follows appropriate naming conventions:
✅ Wire Format ConsistencyAll SDKs serialize to the same JSON-RPC wire format using camelCase keys:
✅ Documentation ConsistencyAll implementations include consistent inline documentation explaining:
✅ OptionalityAll fields correctly marked as optional:
No consistency issues found. This PR successfully maintains API parity across all language implementations. 🎉
|
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
✅ Cross-SDK Consistency ReviewThis PR successfully maintains feature parity across all four SDK implementations. Excellent work! Consistency Check Results✅ All SDKs implement the same four fields:
✅ Naming conventions properly applied:
✅ JSON serialization consistent: ✅ Optionality handled correctly: ✅ Documentation aligned: SummaryNo cross-SDK consistency issues detected. This PR demonstrates excellent multi-language SDK maintenance practices. 🚀
|
| /// This is useful for fine-tuned models that share the same limits as a base model. | ||
| /// </summary> | ||
| [JsonPropertyName("modelLimitsId")] | ||
| public string? ModelLimitsId { get; set; } |
There was a problem hiding this comment.
Curious to know what others think about this name. We might want to consider extending this to also affect inferred model capabilities (streaming, tool calling, multimodality, etc.), at which point this name is a bit too specific.
We could consider something like ModelCatalogId or ModelProfileId.
| /// When set, takes precedence over the default limit resolved from the model's capability catalog entry. | ||
| /// </summary> | ||
| [JsonPropertyName("maxOutputTokens")] | ||
| public int? MaxOutputTokens { get; set; } |
There was a problem hiding this comment.
What is the scope of this? Is it per request to the LLM? Total across all operations performed by the agent until idle?
There was a problem hiding this comment.
Per individual LLM API request. This value is sent as max_tokens in every call to the model provider. Each turn in the agent loop independently gets this cap, and it's not accumulated across the session. It controls how many tokens the model can generate in a single response (text, tool calls, reasoning) before control returns.
| /// When set, takes precedence over the default limit resolved from the model's capability catalog entry. | ||
| /// </summary> | ||
| [JsonPropertyName("maxPromptTokens")] | ||
| public int? MaxPromptTokens { get; set; } |
There was a problem hiding this comment.
Should this be called MaxInputTokens instead of MaxPromptTokens?
Does this include cached tokens?
Same question as above... is this about one request or across a sequence of calls?
There was a problem hiding this comment.
Per-request, but not sent to the API. The runtime uses this internally to decide when to truncate or compact conversation history before each LLM call. "Prompt" here means everything sent to the model in one request: system message, full conversation history up to that point, tool definitions, and the new user message. Cached tokens are counted toward the limit.
The name matches the upstream CAPI /models field (max_prompt_tokens), though MaxInputTokens would also be reasonable.
dotnet/src/Types.cs
Outdated
| /// When set, takes precedence over the default limit resolved from the model's capability catalog entry. | ||
| /// </summary> | ||
| [JsonPropertyName("maxContextWindowTokens")] | ||
| public int? MaxContextWindowTokens { get; set; } |
There was a problem hiding this comment.
What happens when these Max values are hit?
There was a problem hiding this comment.
Each limit is handled differently:
MaxOutputTokenshit: The model stops generating and returnsstop_reason: "max_tokens". Currently the runtime treats the truncated response as normal (silent failure, but I'm going to propose a fix for that soon).MaxPromptTokenshit: The runtime triggers background compaction before sending the request.MaxContextWindowTokenshit: Largely a fallback forMaxPromptTokens. The runtime prefersmax_prompt_tokensfor truncation/compaction decisions and only falls back tomax_context_window_tokenswhen the former isn't set. If the API still rejects the request, the runtime adjusts its stored limit downward and retries. It's probably fine to not expose this value as public API in the SDK (we don't make it configurable for BYOK in the CLI).
There was a problem hiding this comment.
Update: Removed MaxContextWindowTokens.
Remove maxContextWindowTokens from all SDKs - it is an internal runtime fallback that should not be exposed as public SDK API. Refine doc comments for maxOutputTokens and maxPromptTokens to explain what happens when each limit is hit: - maxOutputTokens: sent as max_tokens per LLM request; model stops generating and returns a truncated response when hit. - maxPromptTokens: used by the runtime to trigger conversation compaction before sending a request when the prompt exceeds this limit. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Cross-SDK Consistency Review ✅I've reviewed this PR for consistency across all four SDK implementations. Overall, this is excellent work maintaining feature parity! ✅ What's ConsistentAll four SDKs (Node.js, Python, Go, .NET) correctly implement the same three fields with:
📝 Minor Documentation NoteThe PR description mentions four fields including No action needed - the code is consistent across all SDKs. Just noting the discrepancy between the description and implementation for clarity.
|
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Cross-SDK Consistency Review ✅I've reviewed this PR for consistency across all four SDK implementations. Overall, this is an excellent example of coordinated multi-SDK development with strong feature parity! 🎉 ✅ What's ConsistentAll four SDKs consistently implement the same fields with proper:
Python's
|
Summary
Adds four optional token-limit fields to
ProviderConfigacross all four language SDKs, enabling BYOK users to configure token limits for custom providers. These fields match the runtime'sProviderConfigtype updated in PR #5311.New fields
maxOutputTokensmaxOutputTokens?: numbermax_output_tokens: intint? MaxOutputTokensMaxOutputTokens intmaxPromptTokensmaxPromptTokens?: numbermax_prompt_tokens: intint? MaxPromptTokensMaxPromptTokens intmaxContextWindowTokensmaxContextWindowTokens?: numbermax_context_window_tokens: intint? MaxContextWindowTokensMaxContextWindowTokens intmodelLimitsIdmodelLimitsId?: stringmodel_limits_id: strstring? ModelLimitsIdModelLimitsId stringAll fields are optional. The
Max*fields override the default limits resolved from the model's capability catalog entry.modelLimitsIdspecifies an alternate model ID for capability catalog lookup (defaults to the session's configured model ID when unset) — useful for fine-tuned models that share limits with a base model.Changes
nodejs/src/types.ts) — Added fields toProviderConfiginterfacepython/copilot/session.py) — Added fields toProviderConfigTypedDictpython/copilot/client.py) — Updated_convert_provider_to_wire_formatto map new snake_case fields to camelCasedotnet/src/Types.cs) — Added nullable properties with[JsonPropertyName]attributesgo/types.go) — Added fields withjson:"...,omitempty"tagsTesting
All four SDKs build successfully. No generated RPC types needed updating —
ProviderConfigis defined directly in each SDK.