Add NVFP4 + QAD to the Nemotron-3-Nano-30B-A3B tutorial#1660
Add NVFP4 + QAD to the Nemotron-3-Nano-30B-A3B tutorial#1660kevalmorabia97 wants to merge 1 commit into
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1660 +/- ##
=======================================
Coverage 77.31% 77.31%
=======================================
Files 509 509
Lines 55912 55912
=======================================
Hits 43226 43226
Misses 12686 12686
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Stack NVFP4 (W4A4) quantization with Quantization Aware Distillation (QAD) on top of the pruned + distilled checkpoint, on top of the Megatron-Bridge migration in #1601. - Add an "NVFP4 + Quantization Aware Distillation (QAD)" section to the tutorial: NVFP4 PTQ (quantize.py) -> QAD from the BF16 teacher (distill.py with --student_megatron_path) -> export.py, recovering the NVFP4 accuracy drop. - Add NVFP4 / NVFP4+QAD rows to the accuracy table and an NVFP4 row to the vLLM throughput table (preliminary; some cells are still `?` placeholders). - Add the per-format NVFP4 vLLM benchmark command and NVFP4 deployment notes (FlashInfer FP4 MoE backend, Blackwell) to the tutorial and nemo_evaluator.yaml. - Reframe the tutorial intro / CHANGELOG / pruning READMEs as NVFP4 + QAD. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
e86d2bd to
c8a2de4
Compare
What does this PR do?
Type of change: documentation
Follow-up to #1601 (now merged), which migrated the Nemotron-3-Nano-30B-A3B-BF16 tutorial to the Megatron-Bridge PTQ flow. This PR stacks NVFP4 (W4A4) quantization + Quantization Aware Distillation (QAD) on top of the pruned + distilled checkpoint.
quantize.py) → QAD from the BF16 teacher (distill.py --student_megatron_path) →export.py, to recover the NVFP4 accuracy drop.nemo_evaluator.yaml.CHANGELOG/ pruning READMEs as NVFP4 + QAD.The QAD command uses
examples/megatron_bridge/distill.py --student_megatron_path(QAD support); the other commands usequantize.py/export.py, which are already onmainvia #1601.Testing
Docs-only change; pre-commit hooks (markdownlint, RST checks) pass. Tutorial relative links resolve.
Before your PR is "Ready for review"
CONTRIBUTING.md: N/A/claude review)Additional Information
Builds on #1601. Placeholder
?cells in the Results tables (NVFP4 / NVFP4+QAD accuracy, NVFP4 vLLM throughput) will be filled in with experiment results before this leaves draft.