Skip to content

Add NVFP4 + QAD to the Nemotron-3-Nano-30B-A3B tutorial#1660

Draft
kevalmorabia97 wants to merge 1 commit into
mainfrom
kmorabia/nemotron-3-nano-nvfp4-qad
Draft

Add NVFP4 + QAD to the Nemotron-3-Nano-30B-A3B tutorial#1660
kevalmorabia97 wants to merge 1 commit into
mainfrom
kmorabia/nemotron-3-nano-nvfp4-qad

Conversation

@kevalmorabia97

Copy link
Copy Markdown
Collaborator

What does this PR do?

Type of change: documentation

Follow-up to #1601 (now merged), which migrated the Nemotron-3-Nano-30B-A3B-BF16 tutorial to the Megatron-Bridge PTQ flow. This PR stacks NVFP4 (W4A4) quantization + Quantization Aware Distillation (QAD) on top of the pruned + distilled checkpoint.

  • Adds an "NVFP4 + Quantization Aware Distillation (QAD)" section to the tutorial: NVFP4 PTQ (quantize.py) → QAD from the BF16 teacher (distill.py --student_megatron_path) → export.py, to recover the NVFP4 accuracy drop.
  • Adds NVFP4 / NVFP4+QAD rows to the accuracy table and an NVFP4 row to the vLLM throughput table.
  • Adds the per-format NVFP4 vLLM benchmark command and NVFP4 deployment notes (FlashInfer FP4 MoE backend, Blackwell GPU) to the tutorial and nemo_evaluator.yaml.
  • Reframes the tutorial intro / CHANGELOG / pruning READMEs as NVFP4 + QAD.

Draft: the NVFP4 / NVFP4+QAD accuracy and NVFP4 vLLM throughput numbers are still being collected — several cells are ? placeholders and will be filled in before this leaves draft.

The QAD command uses examples/megatron_bridge/distill.py --student_megatron_path (QAD support); the other commands use quantize.py / export.py, which are already on main via #1601.

Testing

Docs-only change; pre-commit hooks (markdownlint, RST checks) pass. Tutorial relative links resolve.

Before your PR is "Ready for review"

  • Is this change backward compatible?: N/A (docs only)
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
  • Did you write any new necessary tests?: N/A (docs only)
  • Did you update Changelog?: ✅
  • Did you get Claude approval on this PR?: ❌ (will run /claude review)

Additional Information

Builds on #1601. Placeholder ? cells in the Results tables (NVFP4 / NVFP4+QAD accuracy, NVFP4 vLLM throughput) will be filled in with experiment results before this leaves draft.

@copy-pr-bot

copy-pr-bot Bot commented Jun 9, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 03b93011-50ad-4692-a182-a2d6a68fed98

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch kmorabia/nemotron-3-nano-nvfp4-qad

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov

codecov Bot commented Jun 9, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.31%. Comparing base (bde162a) to head (c8a2de4).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1660   +/-   ##
=======================================
  Coverage   77.31%   77.31%           
=======================================
  Files         509      509           
  Lines       55912    55912           
=======================================
  Hits        43226    43226           
  Misses      12686    12686           
Flag Coverage Δ
unit 54.46% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Stack NVFP4 (W4A4) quantization with Quantization Aware Distillation (QAD) on top
of the pruned + distilled checkpoint, on top of the Megatron-Bridge migration in #1601.

- Add an "NVFP4 + Quantization Aware Distillation (QAD)" section to the tutorial:
  NVFP4 PTQ (quantize.py) -> QAD from the BF16 teacher (distill.py with
  --student_megatron_path) -> export.py, recovering the NVFP4 accuracy drop.
- Add NVFP4 / NVFP4+QAD rows to the accuracy table and an NVFP4 row to the vLLM
  throughput table (preliminary; some cells are still `?` placeholders).
- Add the per-format NVFP4 vLLM benchmark command and NVFP4 deployment notes
  (FlashInfer FP4 MoE backend, Blackwell) to the tutorial and nemo_evaluator.yaml.
- Reframe the tutorial intro / CHANGELOG / pruning READMEs as NVFP4 + QAD.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
@kevalmorabia97 kevalmorabia97 force-pushed the kmorabia/nemotron-3-nano-nvfp4-qad branch from e86d2bd to c8a2de4 Compare June 10, 2026 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant