Skip to content

Upgrade ONNX from 1.19 to 1.21#1207

Open
ajrasane wants to merge 8 commits intomainfrom
ajrasane/onnx_version_upgrade
Open

Upgrade ONNX from 1.19 to 1.21#1207
ajrasane wants to merge 8 commits intomainfrom
ajrasane/onnx_version_upgrade

Conversation

@ajrasane
Copy link
Copy Markdown
Contributor

@ajrasane ajrasane commented Apr 8, 2026

What does this PR do?

Type of change: new feature

Upgrade ONNX dependency from ~=1.19.0 to ~=1.21.0. ONNX 1.20+ removed several deprecated
helper functions (float32_to_bfloat16, float32_to_float8e4m3, pack_float32_to_4bit) that
onnx_graphsurgeon 0.5.x still references at import time. This PR adds a compatibility shim
(modelopt/onnx/_onnx_compat.py) that restores these functions using ml_dtypes before any
onnx_graphsurgeon import occurs. This supersedes the partial inline fix from #1204 by also
handling float32_to_float8e4m3.

Changes:

  • Bump onnx~=1.19.0 to onnx~=1.21.0 in pyproject.toml
  • Add modelopt/onnx/_onnx_compat.py compatibility shim for removed ONNX APIs
  • Import shim in modelopt/onnx/__init__.py and tests/unit/onnx/conftest.py
  • Remove usage of removed onnx.helper.pack_float32_to_4bit in test_quant_utils.py
  • Update example requirements (genai_llm, whisper) to onnx==1.21.0

TensorRT Compatibility: TRT 10.16-GA supports opsets 9–24. ModelOpt quantization modes
use opsets 19–23, all within range. ONNX 1.21 does not force opset 26.

Usage

# No API changes — the upgrade is transparent to users.
# The compatibility shim is applied automatically on import.
import modelopt.onnx

Testing

  • 469/470 ONNX unit tests pass inside nvcr.io/nvidia/tensorrt:25.06-py3 (1 pre-existing ORT CopyTensorAsync EP issue, not ONNX-related)
  • 6/6 torch_onnx integration tests pass (fp8, int8, nvfp4, mxfp8, int4_awq, auto)
  • ViT FP8 quantization via torch_onnx → TRT engine build → ImageNet eval: 85.3% top-1, 97.8% top-5
  • ViT FP8 quantization via onnx_ptq → TRT engine build succeeds
  • All pre-commit hooks pass (ruff, mypy, bandit, license headers)

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅
  • Did you write any new necessary tests?: ✅ (updated existing tests, added conftest.py for compat shim)
  • Did you update Changelog?: ❌ (dependency upgrade, no API change)

Additional Information

Related: #1204 (partial fix for float32_to_bfloat16 only — this PR supersedes it with full coverage)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Dependencies

    • Removed unpinned ONNX from example requirement files and updated the ONNX optional dependency to ~=1.21.0.
  • Refactor

    • Centralized an ONNX compatibility shim to restore missing helper APIs when needed.
  • Tests

    • Added tests for the compatibility shim, adjusted quantization tests to remove reliance on removed ONNX helpers, and ensured shim runs before related tests.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 8, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Added a dedicated ONNX compatibility shim module and invoked it on import; removed unpinned onnx from two example requirements; bumped the onnx optional dependency to ~1.21.0; tests updated to import the shim and to stop depending on removed ONNX helper APIs.

Changes

Cohort / File(s) Summary
Example requirements
examples/windows/onnx_ptq/genai_llm/requirements.txt, examples/windows/onnx_ptq/whisper/requirements.txt
Removed the unpinned onnx entry from example requirements.
Project dependency spec
pyproject.toml
Updated project.optional-dependencies.onnx from onnx~=1.19.0 to onnx~=1.21.0.
ONNX compatibility shim
modelopt/onnx/__init__.py, modelopt/onnx/_onnx_compat.py
Replaced inline monkey-patch with import of new _onnx_compat.patch_onnx_helper_removed_apis(); added _onnx_compat which conditionally defines missing onnx.helper helpers (float32_to_bfloat16, float32_to_float8e4m3) using numpy and ml_dtypes when available.
Tests — conftest & compat tests
tests/unit/onnx/conftest.py, tests/unit/onnx/test_onnx_compat.py
Added conftest.py to import the shim at pytest startup; added unit tests covering shim installation, idempotency/non-overwrite, and behavior when ml_dtypes is unavailable.
Tests — quantization changes
tests/unit/onnx/quantization/test_quant_utils.py
Removed reliance on onnx.helper.pack_float32_to_4bit; tests now compare internal implementations and assert explicit expected bytes for unsigned boundary cases.

Sequence Diagram(s)

sequenceDiagram
    participant Importer as Test / Module Import
    participant Shim as modelopt.onnx._onnx_compat
    participant ONNX as onnx.helper
    participant NumPy as numpy / ml_dtypes

    Importer->>Shim: import (side-effect)
    Shim->>ONNX: try import `onnx.helper`
    alt helpers missing
        Shim->>NumPy: import `numpy`, `ml_dtypes`
        NumPy-->>Shim: provide conversion dtypes/utilities
        Shim->>ONNX: define `float32_to_bfloat16` and `float32_to_float8e4m3`
    else imports fail
        Shim-->>Importer: no patch applied (silent)
    end
    Importer->>ONNX: subsequent code/tests may use helpers
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: upgrading ONNX from version 1.19 to 1.21, which is the central objective reflected throughout the changeset.
Security Anti-Patterns ✅ Passed The PR introduces no security anti-patterns from SECURITY.md. The compatibility shim safely patches ONNX helper functions using numpy type conversions with proper import error handling.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ajrasane/onnx_version_upgrade

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1207/

Built to branch gh-pages at 2026-04-09 19:52 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@ajrasane ajrasane force-pushed the ajrasane/onnx_version_upgrade branch from 9356aec to 4dfe4c1 Compare April 8, 2026 18:44
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/onnx/_onnx_compat.py`:
- Around line 36-54: The shimmed helper functions use incomplete signatures
causing TypeError when callers pass legacy kwargs; update _float32_to_bfloat16
and _float32_to_float8e4m3 to match ONNX 1.19.x: change
_float32_to_bfloat16(value) to _float32_to_bfloat16(fval, truncate: bool =
False) and change _float32_to_float8e4m3(value, fn=True, uz=False) to
_float32_to_float8e4m3(fval, scale: float = 1.0, fn: bool = True, uz: bool =
False, saturate: bool = True); inside each function use fval, apply scale for
float8 (e.g., np.float32(fval) * scale) and accept but safely ignore
truncate/saturate if not needed, then perform the existing dtype conversion and
return the same integer representation; update the assignments
onnx.helper.float32_to_bfloat16 and onnx.helper.float32_to_float8e4m3 to
reference the renamed/updated functions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 68e21319-7789-4c62-a7c0-a6d5a73e29d5

📥 Commits

Reviewing files that changed from the base of the PR and between abf4558 and 9356aec.

📒 Files selected for processing (7)
  • examples/windows/onnx_ptq/genai_llm/requirements.txt
  • examples/windows/onnx_ptq/whisper/requirements.txt
  • modelopt/onnx/__init__.py
  • modelopt/onnx/_onnx_compat.py
  • pyproject.toml
  • tests/unit/onnx/conftest.py
  • tests/unit/onnx/quantization/test_quant_utils.py

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/onnx/__init__.py`:
- Around line 21-22: The import of the ONNX compatibility shim
(patch_onnx_helper_removed_apis) is happening before the optional-deps
try/except guard and will raise if onnx/ml_dtypes are missing; wrap the shim
import in the same optional-deps try/except (or move it inside that try block)
so ImportError is caught and handled by your normalized error reporting logic,
i.e., import modelopt.onnx._onnx_compat.patch_onnx_helper_removed_apis inside
the existing try/except that checks for onnx and ml_dtypes rather than at module
top-level.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 584f28d9-70d9-4199-88eb-5662e6e25976

📥 Commits

Reviewing files that changed from the base of the PR and between 9356aec and 4dfe4c1.

📒 Files selected for processing (7)
  • examples/windows/onnx_ptq/genai_llm/requirements.txt
  • examples/windows/onnx_ptq/whisper/requirements.txt
  • modelopt/onnx/__init__.py
  • modelopt/onnx/_onnx_compat.py
  • pyproject.toml
  • tests/unit/onnx/conftest.py
  • tests/unit/onnx/quantization/test_quant_utils.py
💤 Files with no reviewable changes (2)
  • examples/windows/onnx_ptq/whisper/requirements.txt
  • examples/windows/onnx_ptq/genai_llm/requirements.txt
✅ Files skipped from review due to trivial changes (2)
  • pyproject.toml
  • tests/unit/onnx/conftest.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/unit/onnx/quantization/test_quant_utils.py
  • modelopt/onnx/_onnx_compat.py

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 8, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.34%. Comparing base (9b4f43a) to head (845fde9).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1207      +/-   ##
==========================================
+ Coverage   75.42%   77.34%   +1.92%     
==========================================
  Files         353      353              
  Lines       40603    40595       -8     
==========================================
+ Hits        30623    31398     +775     
+ Misses       9980     9197     -783     
Flag Coverage Δ
examples 44.40% <ø> (+2.63%) ⬆️
gpu 56.89% <ø> (-0.21%) ⬇️
unit 55.10% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Collaborator

@kevalmorabia97 kevalmorabia97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. PTAL at Coderabbit comments as well

ONNX 1.20+ removed deprecated helper functions (float32_to_bfloat16,
float32_to_float8e4m3, pack_float32_to_4bit) that onnx_graphsurgeon
0.5.x still references at import time. This adds a compatibility shim
that restores these functions using ml_dtypes before any
onnx_graphsurgeon imports occur.

Changes:
- Bump onnx~=1.19.0 to onnx~=1.21.0 in pyproject.toml
- Add modelopt/onnx/_onnx_compat.py compatibility shim for removed APIs
- Import shim in modelopt/onnx/__init__.py and test conftest.py
- Update test_quant_utils.py to remove usage of removed
  onnx.helper.pack_float32_to_4bit; validate against hardcoded
  expected values instead
- Update example requirements (genai_llm, whisper) to onnx==1.21.0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
@ajrasane ajrasane force-pushed the ajrasane/onnx_version_upgrade branch from 4dfe4c1 to 68ed5a2 Compare April 8, 2026 19:35
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
modelopt/onnx/__init__.py (1)

21-22: ⚠️ Potential issue | 🟠 Major

Move the compatibility shim import inside the try/except guard.

The import of patch_onnx_helper_removed_apis at line 22 occurs before the dependency guard (lines 26–30). If onnx or ml_dtypes is missing, this could fail early and bypass the normalized optional-dependency error message.

As noted in the previous review, move the import inside the existing try/except block to ensure consistent error handling:

Proposed fix
-# Apply ONNX compatibility shim before any onnx_graphsurgeon imports.
-from modelopt.onnx._onnx_compat import patch_onnx_helper_removed_apis
-
 MIN_PYTHON_VERSION = (3, 10)
 
 try:
+    # Apply ONNX compatibility shim before any onnx_graphsurgeon imports.
+    from modelopt.onnx._onnx_compat import patch_onnx_helper_removed_apis
+    patch_onnx_helper_removed_apis()
     from . import quantization
     from .logging_config import configure_logging, logger
 except ImportError as e:
     raise ImportError(f"{e}\nPlease install optional ``[onnx]`` dependencies.")

As per coding guidelines: modelopt/**/*.py should "gate optional dependencies" rather than importing at module top-level before the guard.

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/__init__.py` around lines 21 - 22, The import of
patch_onnx_helper_removed_apis is executed before the optional-dependency guard,
which can raise errors before the normalized message; move the import statement
into the existing try/except that checks for onnx and ml_dtypes so the
compatibility shim is applied only when those deps are present and any
ImportError is handled by the guard. Locate the top-level import of
patch_onnx_helper_removed_apis in modelopt.onnx.__init__ and relocate it inside
the same try block that imports onnx and ml_dtypes (the dependency guard),
ensuring the except block still raises the normalized optional-dependency error
if imports fail. Ensure no other code references the shim before the try block
so behavior remains unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@modelopt/onnx/__init__.py`:
- Around line 21-22: The import of patch_onnx_helper_removed_apis is executed
before the optional-dependency guard, which can raise errors before the
normalized message; move the import statement into the existing try/except that
checks for onnx and ml_dtypes so the compatibility shim is applied only when
those deps are present and any ImportError is handled by the guard. Locate the
top-level import of patch_onnx_helper_removed_apis in modelopt.onnx.__init__ and
relocate it inside the same try block that imports onnx and ml_dtypes (the
dependency guard), ensuring the except block still raises the normalized
optional-dependency error if imports fail. Ensure no other code references the
shim before the try block so behavior remains unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8161e52e-c1d5-4f7f-b478-a92ae3efd27c

📥 Commits

Reviewing files that changed from the base of the PR and between 4dfe4c1 and 68ed5a2.

📒 Files selected for processing (7)
  • examples/windows/onnx_ptq/genai_llm/requirements.txt
  • examples/windows/onnx_ptq/whisper/requirements.txt
  • modelopt/onnx/__init__.py
  • modelopt/onnx/_onnx_compat.py
  • pyproject.toml
  • tests/unit/onnx/conftest.py
  • tests/unit/onnx/quantization/test_quant_utils.py
💤 Files with no reviewable changes (2)
  • examples/windows/onnx_ptq/whisper/requirements.txt
  • examples/windows/onnx_ptq/genai_llm/requirements.txt
✅ Files skipped from review due to trivial changes (3)
  • pyproject.toml
  • tests/unit/onnx/conftest.py
  • modelopt/onnx/_onnx_compat.py

ajrasane added 2 commits April 8, 2026 20:11
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
@ajrasane ajrasane force-pushed the ajrasane/onnx_version_upgrade branch from 05c1a3e to 3fd3336 Compare April 8, 2026 20:47
@hthadicherla
Copy link
Copy Markdown
Contributor

hthadicherla commented Apr 9, 2026

Seems like this test is failing
tests/unit/torch/quantization/test_quantize_cpu.py::test_save_restore[SimpleLinear-quant_config3] FAILED [ 72%].

But looks unrelated to the change, since i saw multiple PR's fail yesterday at this same test.

@kevalmorabia97
Copy link
Copy Markdown
Collaborator

tests/unit/torch/quantization/test_quantize_cpu.py::test_save_restore is disabled on Windows now in main branch as its flaky

@ajrasane ajrasane enabled auto-merge (squash) April 9, 2026 15:19
@ajrasane ajrasane disabled auto-merge April 9, 2026 15:33
… onnx 1.21.0

onnx-graphsurgeon 0.6.1 no longer references the removed onnx.helper functions
(float32_to_bfloat16, float32_to_float8e4m3), so the compatibility shim and its
tests are no longer needed. Pin onnx-graphsurgeon>=0.6.1 in pyproject.toml.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
@ajrasane ajrasane enabled auto-merge (squash) April 9, 2026 15:54
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed anymore

Copy link
Copy Markdown
Collaborator

@cjluo-nv cjluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address @kevalmorabia97 's comment before landing

Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
yeyu-nvidia added a commit that referenced this pull request Apr 9, 2026
…_template (#1225)

## Summary

- `apply_chat_template(..., return_tensors="pt")` returns a
`BatchEncoding` in transformers 4.46+, which no longer subclasses `dict`
- The old guard `isinstance(tokenized, dict)` evaluates to `False` for
`BatchEncoding`, so `input_ids` was set to the whole `BatchEncoding`
object
- Calling `.shape[1]` on a `BatchEncoding` triggers
`__getattr__("shape")` → `AttributeError`
- Fix: check `isinstance(tokenized, torch.Tensor)` instead, which
correctly handles both old transformers (plain tensor) and new
transformers (BatchEncoding)

This is causing `test_collect_hidden_states` to fail in the speculative
decoding CI for all open PRs (#1207, #1210, #1221).

## Test plan

- [ ] `torch-pr (speculative_decoding, 26.01)` CI passes
- [ ] Verify fix handles both `torch.Tensor` return (old transformers)
and `BatchEncoding` return (new transformers 4.46+)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Ye Yu <yeyu@nvidia.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants