Add Z-Image (NextDiT/Lumina2) PTQ quantization support in diffusers example#1205
Add Z-Image (NextDiT/Lumina2) PTQ quantization support in diffusers example#1205andrea-pilzer wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
…xample - Fix broken AttentionModuleMixin import: path changed from `modelopt.torch.quantization.plugins.diffusers` to `modelopt.torch.quantization.plugins.diffusion.diffusers` in 0.42.0. - Add `filter_func_zimage()` in utils.py to skip patch embedder, conditioning MLPs, norms, and positional embeddings while quantizing all attention (qkv/out) and FFN (w1/w2/w3) linears. - Register `zimage` and `zimage-turbo` model types in models_utils.py with HF model IDs, ZImagePipeline, and calibration defaults. `num_inference_steps` is intentionally omitted from inference_extra_args to avoid collision with the --n-steps CLI argument. - Add `ZImageTransformer2DModel` dummy input generation in `generate_diffusion_dummy_inputs()`, resolving the QKV fusion warning during HF checkpoint export. - Add ZIMAGE_PATH test constant and integration test parametrize case (skipif when MODELOPT_LOCAL_MODEL_ROOT is unset — no tiny pipe on hf-internal-testing). Add 14-case unit test for filter_func_zimage. Tested end-to-end: FP8 calibration (128 samples, Gustavosta/Stable-Diffusion-Prompts), HF checkpoint export, and inference via ZImagePipeline.from_pretrained() on DGX Spark (GB10, 128 GB unified memory). Signed-off-by: andrea-pilzer <apilzer@nvidia.com>
📝 WalkthroughWalkthroughThis PR adds Z-Image and Z-Image-Turbo quantization support to the diffusers quantization example through model type enums, a quantization filter function, dummy input generation for Z-Image transformers, model registry entries with HuggingFace model IDs and quantization configs, test utilities, and test cases. It also corrects a broken Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/examples/diffusers/test_diffusers.py (1)
207-251: LGTM with a minor suggestion.The
test_filter_func_zimagetest provides comprehensive coverage of the filter function's behavior for both skip and quantize scenarios.Consider using a context manager or fixture to handle the
sys.pathmodification to ensure cleanup:♻️ Optional: Use addopts or fixture for cleaner sys.path handling
def test_filter_func_zimage(name: str, expected: bool) -> None: """filter_func_zimage must skip conditioning/norm layers and quantize attention/FFN linears.""" import sys - sys.path.insert(0, str(Path(__file__).parents[3] / "examples/diffusers/quantization")) - from utils import filter_func_zimage - - assert filter_func_zimage(name) == expected + example_path = str(Path(__file__).parents[3] / "examples/diffusers/quantization") + sys.path.insert(0, example_path) + try: + from utils import filter_func_zimage + assert filter_func_zimage(name) == expected + finally: + sys.path.remove(example_path)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/examples/diffusers/test_diffusers.py` around lines 207 - 251, Replace the ad-hoc sys.path insertion in test_filter_func_zimage with a context-managed or fixture-based approach that temporarily prepends the examples path and restores sys.path afterwards; specifically, avoid leaving the modified sys.path by wrapping the current sys.path.insert(0, ...) usage (or the whole import from utils) in a context manager (or pytest fixture) that yields control to run filter_func_zimage and then removes the inserted path, ensuring the test imports utils without side effects on subsequent tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/examples/diffusers/test_diffusers.py`:
- Around line 207-251: Replace the ad-hoc sys.path insertion in
test_filter_func_zimage with a context-managed or fixture-based approach that
temporarily prepends the examples path and restores sys.path afterwards;
specifically, avoid leaving the modified sys.path by wrapping the current
sys.path.insert(0, ...) usage (or the whole import from utils) in a context
manager (or pytest fixture) that yields control to run filter_func_zimage and
then removes the inserted path, ensuring the test imports utils without side
effects on subsequent tests.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1073c201-88a4-4d66-8f75-e3850c1845fe
📒 Files selected for processing (6)
CHANGELOG.rstexamples/diffusers/quantization/models_utils.pyexamples/diffusers/quantization/utils.pymodelopt/torch/export/diffusers_utils.pytests/_test_utils/examples/models.pytests/examples/diffusers/test_diffusers.py
What does this PR do?
Type of change: New feature + Bug fix
Adds PTQ quantization support for Z-Image and Z-Image-Turbo (
Tongyi-MAI/Z-Image,Tongyi-MAI/Z-Image-Turbo) in the diffusers quantization example. Z-Image is built on the NextDiT (Lumina2) transformer backbone with a Qwen3-4B text encoder.Also fixes a hard
ImportErrorintroduced in modelopt 0.42.0:AttentionModuleMixinmoved frommodelopt.torch.quantization.plugins.diffuserstomodelopt.torch.quantization.plugins.diffusion.diffusers.Changes
examples/diffusers/quantization/utils.py: Fix import path; addfilter_func_zimage()to skip patch embedder, final projection, conditioning MLPs, and norm layers while quantizing all attention (qkv/out) and FFN (w1/w2/w3) linears.examples/diffusers/quantization/models_utils.py: Registerzimageandzimage-turbomodel types with HF model IDs,ZImagePipeline, and calibration defaults.num_inference_stepsis intentionally omitted frominference_extra_argsto avoid aTypeError: got multiple valuescollision with the--n-stepsCLI argument.modelopt/torch/export/diffusers_utils.py: AddZImageTransformer2DModeldummy input generation ingenerate_diffusion_dummy_inputs(), resolving a QKV fusion warning during HF checkpoint export.tests/: AddZIMAGE_PATHconstant, integration test parametrize case (skipped whenMODELOPT_LOCAL_MODEL_ROOTis unset — no tiny pipe onhf-internal-testing), and 14-case unit test forfilter_func_zimage.Usage
Testing
Tested end-to-end on DGX Spark (GB10, 128 GB unified memory):
Gustavosta/Stable-Diffusion-Prompts(998 quantizers inserted)--hf-ckpt-dirZImagePipeline.from_pretrained()withenable_huggingface_checkpointing()Before your PR is "Ready for review"
elifbranchAdditional notes
A separate issue will be filed for
export_hf_checkpointnot writingquant_typetotransformer/config.json— this affectsNVIDIAModelOptConfig-consuming loaders in diffusers and is a separate concern in the export path.Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Tests