-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Add gpt-5.3 model support to TiktokenTokenizer #7579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: stephentoub <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request adds support for the GPT-5.3 model family to the TiktokenTokenizer, following the established pattern used for previous GPT model versions (gpt-5.2, gpt-5.1, etc.). The changes enable tokenization for both the base "gpt-5.3" model and its variants (e.g., "gpt-5.3-mini") using the O200kBase encoding.
Changes:
- Added gpt-5.3 model mappings to TiktokenTokenizer for both prefix and exact matching
- Added GPT5_3 static tokenizer property and comprehensive test coverage for the new model
- Included test cases for both base and mini variants in the test suite
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| src/Microsoft.ML.Tokenizers/Model/TiktokenTokenizer.cs | Added "gpt-5.3-" prefix mapping and "gpt-5.3" exact name mapping to model encoding arrays, both using O200kBase encoding |
| test/Microsoft.ML.Tokenizers.Tests/TiktokenTests.cs | Added GPT5_3 static property and test data entries for gpt-5.3 and gpt-5.3-mini variants in TestAllSupportedModelNames and TestCreationUsingModel methods |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tarekgh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #7579 +/- ##
=======================================
Coverage 69.05% 69.06%
=======================================
Files 1483 1483
Lines 274362 274365 +3
Branches 28270 28270
=======================================
+ Hits 189466 189482 +16
+ Misses 77510 77498 -12
+ Partials 7386 7385 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Adds support for the gpt-5.3 model family to TiktokenTokenizer using O200kBase encoding.
Changes
gpt-5.3andgpt-5.3-mappings to model encoding dictionariesgpt-5.3base and mini variantsUsage
Follows the same pattern established for gpt-5.2 support.
Warning
Firewall rules blocked me from connecting to one or more addresses (expand for details)
I tried to connect to the following addresses, but was blocked by firewall rules:
dnceng.pkgs.visualstudio.com/home/REDACTED/work/machinelearning/machinelearning/.dotnet/dotnet dotnet test test/Microsoft.ML.Tokenizers.Tests/Microsoft.ML.Tokenizers.Tests.csproj --filter FullyQualifiedName~TiktokenTests --no-build(dns block)openaipublic.blob.core.windows.net/home/REDACTED/work/machinelearning/machinelearning/.dotnet/dotnet /home/REDACTED/work/machinelearning/machinelearning/.dotnet/dotnet exec --runtimeconfig /home/REDACTED/work/machinelearning/machinelearning/artifacts/bin/Microsoft.ML.Tokenizers.Tests/Debug/net8.0/Microsoft.ML.Tokenizers.Tests.runtimeconfig.json --depsfile /home/REDACTED/work/machinelearning/machinelearning/artifacts/bin/Microsoft.ML.Tokenizers.Tests/Debug/net8.0/Microsoft.ML.Tokenizers.Tests.deps.json /home/REDACTED/work/machinelearning/machinelearning/artifacts/bin/Microsoft.ML.Tokenizers.Tests/Debug/net8.0/testhost.dll --port 45083 --endpoint 127.0.0.1:045083 --role client --parentprocessid 6859 --telemetryoptedin false(dns block)If you need me to access, download, or install something from one of these locations, you can either:
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.