Skip to content

Add Z-Image (NextDiT/Lumina2) PTQ quantization support in diffusers example#1205

Open
andrea-pilzer wants to merge 1 commit intoNVIDIA:mainfrom
andrea-pilzer:feat/zimage-quantization-support
Open

Add Z-Image (NextDiT/Lumina2) PTQ quantization support in diffusers example#1205
andrea-pilzer wants to merge 1 commit intoNVIDIA:mainfrom
andrea-pilzer:feat/zimage-quantization-support

Conversation

@andrea-pilzer
Copy link
Copy Markdown

@andrea-pilzer andrea-pilzer commented Apr 8, 2026

What does this PR do?

Type of change: New feature + Bug fix

Adds PTQ quantization support for Z-Image and Z-Image-Turbo (Tongyi-MAI/Z-Image, Tongyi-MAI/Z-Image-Turbo) in the diffusers quantization example. Z-Image is built on the NextDiT (Lumina2) transformer backbone with a Qwen3-4B text encoder.

Also fixes a hard ImportError introduced in modelopt 0.42.0: AttentionModuleMixin moved from modelopt.torch.quantization.plugins.diffusers to modelopt.torch.quantization.plugins.diffusion.diffusers.

Changes

  • examples/diffusers/quantization/utils.py: Fix import path; add filter_func_zimage() to skip patch embedder, final projection, conditioning MLPs, and norm layers while quantizing all attention (qkv/out) and FFN (w1/w2/w3) linears.
  • examples/diffusers/quantization/models_utils.py: Register zimage and zimage-turbo model types with HF model IDs, ZImagePipeline, and calibration defaults. num_inference_steps is intentionally omitted from inference_extra_args to avoid a TypeError: got multiple values collision with the --n-steps CLI argument.
  • modelopt/torch/export/diffusers_utils.py: Add ZImageTransformer2DModel dummy input generation in generate_diffusion_dummy_inputs(), resolving a QKV fusion warning during HF checkpoint export.
  • tests/: Add ZIMAGE_PATH constant, integration test parametrize case (skipped when MODELOPT_LOCAL_MODEL_ROOT is unset — no tiny pipe on hf-internal-testing), and 14-case unit test for filter_func_zimage.

Usage

python quantize.py \
  --model zimage \
  --model-dtype BFloat16 \
  --format fp8 \
  --batch-size 1 \
  --calib-size 128 \
  --n-steps 20 \
  --hf-ckpt-dir ./exports/zimage-fp8-hf

Testing

Tested end-to-end on DGX Spark (GB10, 128 GB unified memory):

  • FP8 calibration over 128 samples from Gustavosta/Stable-Diffusion-Prompts (998 quantizers inserted)
  • HF checkpoint export via --hf-ckpt-dir
  • Inference via ZImagePipeline.from_pretrained() with enable_huggingface_checkpointing()

Before your PR is "Ready for review"

  • Is this change backward compatible? ✅ Purely additive — new enum values, new dict entries, new elif branch
  • Did you write any new necessary tests? ✅
  • Did you update Changelog? ✅

Additional notes

A separate issue will be filed for export_hf_checkpoint not writing quant_type to transformer/config.json — this affects NVIDIAModelOptConfig-consuming loaders in diffusers and is a separate concern in the export path.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added Z-Image and Z-Image-Turbo model quantization support with FP8/INT8/INT4 options in diffusers examples.
    • Expanded CLI with new model options for Z-Image variants.
  • Bug Fixes

    • Corrected broken import path for quantization module in diffusers example documentation.
  • Tests

    • Added test coverage for Z-Image quantization filtering.

…xample

- Fix broken AttentionModuleMixin import: path changed from
  `modelopt.torch.quantization.plugins.diffusers` to
  `modelopt.torch.quantization.plugins.diffusion.diffusers` in 0.42.0.
- Add `filter_func_zimage()` in utils.py to skip patch embedder,
  conditioning MLPs, norms, and positional embeddings while quantizing
  all attention (qkv/out) and FFN (w1/w2/w3) linears.
- Register `zimage` and `zimage-turbo` model types in models_utils.py
  with HF model IDs, ZImagePipeline, and calibration defaults.
  `num_inference_steps` is intentionally omitted from inference_extra_args
  to avoid collision with the --n-steps CLI argument.
- Add `ZImageTransformer2DModel` dummy input generation in
  `generate_diffusion_dummy_inputs()`, resolving the QKV fusion warning
  during HF checkpoint export.
- Add ZIMAGE_PATH test constant and integration test parametrize case
  (skipif when MODELOPT_LOCAL_MODEL_ROOT is unset — no tiny pipe on
  hf-internal-testing). Add 14-case unit test for filter_func_zimage.

Tested end-to-end: FP8 calibration (128 samples, Gustavosta/Stable-Diffusion-Prompts),
HF checkpoint export, and inference via ZImagePipeline.from_pretrained()
on DGX Spark (GB10, 128 GB unified memory).

Signed-off-by: andrea-pilzer <apilzer@nvidia.com>
@andrea-pilzer andrea-pilzer requested review from a team as code owners April 8, 2026 12:38
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 8, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 8, 2026

📝 Walkthrough

Walkthrough

This PR adds Z-Image and Z-Image-Turbo quantization support to the diffusers quantization example through model type enums, a quantization filter function, dummy input generation for Z-Image transformers, model registry entries with HuggingFace model IDs and quantization configs, test utilities, and test cases. It also corrects a broken AttentionModuleMixin import path in the changelog.

Changes

Cohort / File(s) Summary
Changelog & Documentation
CHANGELOG.rst
Fixed broken import path for AttentionModuleMixin and documented new Z-Image/Z-Image-Turbo quantization support with FP8/INT8/INT4 PTQ and Z-Image dummy input generation.
Model Configuration & Registry
examples/diffusers/quantization/models_utils.py
Extended ModelType enum with ZIMAGE and ZIMAGE_TURBO entries; added model filter function mapping, HuggingFace model IDs, and calibration/inference configs (with turbo variant adjusting guidance_scale to 1.0 and num steps).
Quantization Filter Function
examples/diffusers/quantization/utils.py
Added filter_func_zimage that excludes Z-Image/NextDiT backbone components (x_embedder, final_layer, time_caption_embed, cap_embedder, norm weights, pos_embed) from quantization via regex matching.
Dummy Input Generation
modelopt/torch/export/diffusers_utils.py
Extended generate_diffusion_dummy_inputs to support ZImageTransformer2DModel with a dedicated builder generating dummy hidden_states, timestep, and encoder_hidden_states using fixed sequence lengths and config-derived channel dimensions.
Test Utilities & Test Cases
tests/_test_utils/examples/models.py, tests/examples/diffusers/test_diffusers.py
Added ZIMAGE_PATH constant for test model selection; parametrized test_diffusers_quantization with Z-Image configuration; added test_filter_func_zimage to verify layer filtering behavior for Z-Image backbone components.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 45.45% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding Z-Image PTQ quantization support to the diffusers example, which is the primary objective across all modified files.
Security Anti-Patterns ✅ Passed All five modified files reviewed against SECURITY.md anti-patterns. No instances of unsafe torch.load, numpy.load, hardcoded trust_remote_code, eval/exec on external input, or # nosec bypass comments found.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch feat/zimage-quantization-support

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/examples/diffusers/test_diffusers.py (1)

207-251: LGTM with a minor suggestion.

The test_filter_func_zimage test provides comprehensive coverage of the filter function's behavior for both skip and quantize scenarios.

Consider using a context manager or fixture to handle the sys.path modification to ensure cleanup:

♻️ Optional: Use addopts or fixture for cleaner sys.path handling
 def test_filter_func_zimage(name: str, expected: bool) -> None:
     """filter_func_zimage must skip conditioning/norm layers and quantize attention/FFN linears."""
     import sys
 
-    sys.path.insert(0, str(Path(__file__).parents[3] / "examples/diffusers/quantization"))
-    from utils import filter_func_zimage
-
-    assert filter_func_zimage(name) == expected
+    example_path = str(Path(__file__).parents[3] / "examples/diffusers/quantization")
+    sys.path.insert(0, example_path)
+    try:
+        from utils import filter_func_zimage
+        assert filter_func_zimage(name) == expected
+    finally:
+        sys.path.remove(example_path)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/examples/diffusers/test_diffusers.py` around lines 207 - 251, Replace
the ad-hoc sys.path insertion in test_filter_func_zimage with a context-managed
or fixture-based approach that temporarily prepends the examples path and
restores sys.path afterwards; specifically, avoid leaving the modified sys.path
by wrapping the current sys.path.insert(0, ...) usage (or the whole import from
utils) in a context manager (or pytest fixture) that yields control to run
filter_func_zimage and then removes the inserted path, ensuring the test imports
utils without side effects on subsequent tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/examples/diffusers/test_diffusers.py`:
- Around line 207-251: Replace the ad-hoc sys.path insertion in
test_filter_func_zimage with a context-managed or fixture-based approach that
temporarily prepends the examples path and restores sys.path afterwards;
specifically, avoid leaving the modified sys.path by wrapping the current
sys.path.insert(0, ...) usage (or the whole import from utils) in a context
manager (or pytest fixture) that yields control to run filter_func_zimage and
then removes the inserted path, ensuring the test imports utils without side
effects on subsequent tests.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1073c201-88a4-4d66-8f75-e3850c1845fe

📥 Commits

Reviewing files that changed from the base of the PR and between bd80265 and a5eec64.

📒 Files selected for processing (6)
  • CHANGELOG.rst
  • examples/diffusers/quantization/models_utils.py
  • examples/diffusers/quantization/utils.py
  • modelopt/torch/export/diffusers_utils.py
  • tests/_test_utils/examples/models.py
  • tests/examples/diffusers/test_diffusers.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant