Add Z-Image (NextDiT/Lumina2) PTQ quantization support in diffusers example by andrea-pilzer · Pull Request #1205 · NVIDIA/Model-Optimizer

andrea-pilzer · 2026-04-08T12:38:49Z

What does this PR do?

Type of change: New feature + Bug fix

Adds PTQ quantization support for Z-Image and Z-Image-Turbo (Tongyi-MAI/Z-Image, Tongyi-MAI/Z-Image-Turbo) in the diffusers quantization example. Z-Image is built on the NextDiT (Lumina2) transformer backbone with a Qwen3-4B text encoder.

Also fixes a hard ImportError introduced in modelopt 0.42.0: AttentionModuleMixin moved from modelopt.torch.quantization.plugins.diffusers to modelopt.torch.quantization.plugins.diffusion.diffusers.

Changes

examples/diffusers/quantization/utils.py: Fix import path; add filter_func_zimage() to skip patch embedder, final projection, conditioning MLPs, and norm layers while quantizing all attention (qkv/out) and FFN (w1/w2/w3) linears.
examples/diffusers/quantization/models_utils.py: Register zimage and zimage-turbo model types with HF model IDs, ZImagePipeline, and calibration defaults. num_inference_steps is intentionally omitted from inference_extra_args to avoid a TypeError: got multiple values collision with the --n-steps CLI argument.
modelopt/torch/export/diffusers_utils.py: Add ZImageTransformer2DModel dummy input generation in generate_diffusion_dummy_inputs(), resolving a QKV fusion warning during HF checkpoint export.
tests/: Add ZIMAGE_PATH constant, integration test parametrize case (skipped when MODELOPT_LOCAL_MODEL_ROOT is unset — no tiny pipe on hf-internal-testing), and 14-case unit test for filter_func_zimage.

Usage

python quantize.py \
  --model zimage \
  --model-dtype BFloat16 \
  --format fp8 \
  --batch-size 1 \
  --calib-size 128 \
  --n-steps 20 \
  --hf-ckpt-dir ./exports/zimage-fp8-hf

Testing

Tested end-to-end on DGX Spark (GB10, 128 GB unified memory):

FP8 calibration over 128 samples from Gustavosta/Stable-Diffusion-Prompts (998 quantizers inserted)
HF checkpoint export via --hf-ckpt-dir
Inference via ZImagePipeline.from_pretrained() with enable_huggingface_checkpointing()

Before your PR is "Ready for review"

Is this change backward compatible? ✅ Purely additive — new enum values, new dict entries, new elif branch
Did you write any new necessary tests? ✅
Did you update Changelog? ✅

Additional notes

A separate issue will be filed for export_hf_checkpoint not writing quant_type to transformer/config.json — this affects NVIDIAModelOptConfig-consuming loaders in diffusers and is a separate concern in the export path.

Summary by CodeRabbit

Release Notes

New Features
- Added Z-Image and Z-Image-Turbo model quantization support with FP8/INT8/INT4 options in diffusers examples.
- Expanded CLI with new model options for Z-Image variants.
Bug Fixes
- Corrected broken import path for quantization module in diffusers example documentation.
Tests
- Added test coverage for Z-Image quantization filtering.

…xample - Fix broken AttentionModuleMixin import: path changed from `modelopt.torch.quantization.plugins.diffusers` to `modelopt.torch.quantization.plugins.diffusion.diffusers` in 0.42.0. - Add `filter_func_zimage()` in utils.py to skip patch embedder, conditioning MLPs, norms, and positional embeddings while quantizing all attention (qkv/out) and FFN (w1/w2/w3) linears. - Register `zimage` and `zimage-turbo` model types in models_utils.py with HF model IDs, ZImagePipeline, and calibration defaults. `num_inference_steps` is intentionally omitted from inference_extra_args to avoid collision with the --n-steps CLI argument. - Add `ZImageTransformer2DModel` dummy input generation in `generate_diffusion_dummy_inputs()`, resolving the QKV fusion warning during HF checkpoint export. - Add ZIMAGE_PATH test constant and integration test parametrize case (skipif when MODELOPT_LOCAL_MODEL_ROOT is unset — no tiny pipe on hf-internal-testing). Add 14-case unit test for filter_func_zimage. Tested end-to-end: FP8 calibration (128 samples, Gustavosta/Stable-Diffusion-Prompts), HF checkpoint export, and inference via ZImagePipeline.from_pretrained() on DGX Spark (GB10, 128 GB unified memory). Signed-off-by: andrea-pilzer <apilzer@nvidia.com>

copy-pr-bot · 2026-04-08T12:38:53Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-04-08T12:39:07Z

📝 Walkthrough

Walkthrough

This PR adds Z-Image and Z-Image-Turbo quantization support to the diffusers quantization example through model type enums, a quantization filter function, dummy input generation for Z-Image transformers, model registry entries with HuggingFace model IDs and quantization configs, test utilities, and test cases. It also corrects a broken AttentionModuleMixin import path in the changelog.

Changes

Cohort / File(s)	Summary
Changelog & Documentation `CHANGELOG.rst`	Fixed broken import path for `AttentionModuleMixin` and documented new Z-Image/Z-Image-Turbo quantization support with FP8/INT8/INT4 PTQ and Z-Image dummy input generation.
Model Configuration & Registry `examples/diffusers/quantization/models_utils.py`	Extended `ModelType` enum with `ZIMAGE` and `ZIMAGE_TURBO` entries; added model filter function mapping, HuggingFace model IDs, and calibration/inference configs (with turbo variant adjusting `guidance_scale` to 1.0 and num steps).
Quantization Filter Function `examples/diffusers/quantization/utils.py`	Added `filter_func_zimage` that excludes Z-Image/NextDiT backbone components (`x_embedder`, `final_layer`, `time_caption_embed`, `cap_embedder`, `norm` weights, `pos_embed`) from quantization via regex matching.
Dummy Input Generation `modelopt/torch/export/diffusers_utils.py`	Extended `generate_diffusion_dummy_inputs` to support `ZImageTransformer2DModel` with a dedicated builder generating dummy `hidden_states`, `timestep`, and `encoder_hidden_states` using fixed sequence lengths and config-derived channel dimensions.
Test Utilities & Test Cases `tests/_test_utils/examples/models.py`, `tests/examples/diffusers/test_diffusers.py`	Added `ZIMAGE_PATH` constant for test model selection; parametrized `test_diffusers_quantization` with Z-Image configuration; added `test_filter_func_zimage` to verify layer filtering behavior for Z-Image backbone components.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 45.45% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: adding Z-Image PTQ quantization support to the diffusers example, which is the primary objective across all modified files.
Security Anti-Patterns	✅ Passed	All five modified files reviewed against SECURITY.md anti-patterns. No instances of unsafe torch.load, numpy.load, hardcoded trust_remote_code, eval/exec on external input, or # nosec bypass comments found.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

⚔️ Resolve merge conflicts

Resolve merge conflict in branch feat/zimage-quantization-support

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

tests/examples/diffusers/test_diffusers.py (1)

207-251: LGTM with a minor suggestion.

The test_filter_func_zimage test provides comprehensive coverage of the filter function's behavior for both skip and quantize scenarios.

Consider using a context manager or fixture to handle the sys.path modification to ensure cleanup:

♻️ Optional: Use addopts or fixture for cleaner sys.path handling

 def test_filter_func_zimage(name: str, expected: bool) -> None:
     """filter_func_zimage must skip conditioning/norm layers and quantize attention/FFN linears."""
     import sys
 
-    sys.path.insert(0, str(Path(__file__).parents[3] / "examples/diffusers/quantization"))
-    from utils import filter_func_zimage
-
-    assert filter_func_zimage(name) == expected
+    example_path = str(Path(__file__).parents[3] / "examples/diffusers/quantization")
+    sys.path.insert(0, example_path)
+    try:
+        from utils import filter_func_zimage
+        assert filter_func_zimage(name) == expected
+    finally:
+        sys.path.remove(example_path)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/examples/diffusers/test_diffusers.py` around lines 207 - 251, Replace
the ad-hoc sys.path insertion in test_filter_func_zimage with a context-managed
or fixture-based approach that temporarily prepends the examples path and
restores sys.path afterwards; specifically, avoid leaving the modified sys.path
by wrapping the current sys.path.insert(0, ...) usage (or the whole import from
utils) in a context manager (or pytest fixture) that yields control to run
filter_func_zimage and then removes the inserted path, ensuring the test imports
utils without side effects on subsequent tests.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/examples/diffusers/test_diffusers.py`:
- Around line 207-251: Replace the ad-hoc sys.path insertion in
test_filter_func_zimage with a context-managed or fixture-based approach that
temporarily prepends the examples path and restores sys.path afterwards;
specifically, avoid leaving the modified sys.path by wrapping the current
sys.path.insert(0, ...) usage (or the whole import from utils) in a context
manager (or pytest fixture) that yields control to run filter_func_zimage and
then removes the inserted path, ensuring the test imports utils without side
effects on subsequent tests.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1073c201-88a4-4d66-8f75-e3850c1845fe

📥 Commits

Reviewing files that changed from the base of the PR and between bd80265 and a5eec64.

📒 Files selected for processing (6)

CHANGELOG.rst
examples/diffusers/quantization/models_utils.py
examples/diffusers/quantization/utils.py
modelopt/torch/export/diffusers_utils.py
tests/_test_utils/examples/models.py
tests/examples/diffusers/test_diffusers.py

andrea-pilzer requested review from a team as code owners April 8, 2026 12:38

andrea-pilzer requested review from chadvoegele and jingyu-ml April 8, 2026 12:38

coderabbitai bot reviewed Apr 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Z-Image (NextDiT/Lumina2) PTQ quantization support in diffusers example#1205

Add Z-Image (NextDiT/Lumina2) PTQ quantization support in diffusers example#1205
andrea-pilzer wants to merge 1 commit intoNVIDIA:mainfrom
andrea-pilzer:feat/zimage-quantization-support

andrea-pilzer commented Apr 8, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Apr 8, 2026

Uh oh!

coderabbitai bot commented Apr 8, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andrea-pilzer commented Apr 8, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Changes

Usage

Testing

Before your PR is "Ready for review"

Additional notes

Summary by CodeRabbit

Release Notes

Uh oh!

copy-pr-bot bot commented Apr 8, 2026

Uh oh!

coderabbitai bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

andrea-pilzer commented Apr 8, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 8, 2026 •

edited

Loading