Add heap-based BPE merge path for large inputs (>128 bytes)#7580
Add heap-based BPE merge path for large inputs (>128 bytes)#7580
Conversation
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
…p capacity Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #7580 +/- ##
==========================================
+ Coverage 69.05% 69.08% +0.02%
==========================================
Files 1483 1483
Lines 274362 274513 +151
Branches 28270 28285 +15
==========================================
+ Hits 189466 189648 +182
+ Misses 77510 77482 -28
+ Partials 7386 7383 -3
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
|
Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
|
@copilot, can you write and run a benchmark using benchmarkdotnet, comparing the performance before and after this change? Please run it with an input that would have a mergingBytes length of 127, 128, and 1000. |
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
|
Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
Here are the BenchmarkDotNet results comparing before (original O(n²) only) vs after (heap-based for >128 bytes): Before (original algorithm for all sizes): After (heap-based for >128 bytes): Summary:
The speed improvement at length 1000 is significant. The memory increase comes from the PriorityQueue's internal list (pre-allocated to Environment: BenchmarkDotNet v0.13.12, Ubuntu 24.04.3 LTS, Intel Xeon Platinum 8370C 2.80GHz, .NET 8.0.16. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
…odeLarge Co-authored-by: tarekgh <10833894+tarekgh@users.noreply.github.com>
|
Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
…ecific) Co-authored-by: tarekgh <10833894+tarekgh@users.noreply.github.com>
|
Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
…is only called for >128 bytes Co-authored-by: tarekgh <10833894+tarekgh@users.noreply.github.com>
|
Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
…deLarge Co-authored-by: tarekgh <10833894+tarekgh@users.noreply.github.com>
|
Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
Heap-based BPE Optimization for Large Inputs
Changes Made
Original prompt
This pull request was created from Copilot chat.
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.