docs(deployment skill): drop wrong "release predates arch" cu130 fallback by Edwardf0t1 · Pull Request #1654 · NVIDIA/Model-Optimizer

Edwardf0t1 · 2026-06-08T22:41:51Z

What does this PR do?

Type of change: Documentation (agent skill)

Removes an incorrect fallback note from the deployment skill's NVFP4-on-Blackwell (sm_103) guidance. The note claimed:

If a pinned release predates the model's arch, use cu130-nightly-<arch> instead (Qwen3.5-9B's qwen3_5 needed it).

This is a wrong inference. I verified on a B300 that the pinned release vllm/vllm-openai:v0.19.1-cu130 loads and serves the Qwen3.5-9B NVFP4 checkpoint (qwen3_5 / Qwen3_5ForConditionalGeneration) cleanly — /health 200, lists the model, generates. So the release does not predate qwen3_5; the only real requirement is the -cu130 build (sm_103 FP4 kernels), which the note already states and this PR keeps.

This mirrors the same removal in the evaluation skill (#1649) — that PR didn't touch the deployment skill, so this is the companion one-liner.

Testing

Empirical: served the checkpoint with v0.19.1-cu130 on aws-pdx (B300) → engine init succeeds, /v1/models lists it, chat completion returns text. Docs-only change otherwise.

Before your PR is "Ready for review"

Is this change backward compatible?: ✅ (docs only)
Did you write any new necessary tests?: N/A (docs)
Did you add or update any necessary documentation?: ✅ (this is the doc fix)
Did you update Changelog?: N/A (skill docs, not shipped library)

🤖 Generated with Claude Code

Summary by CodeRabbit

Documentation
- Updated vLLM deployment instructions for NVFP4 on Blackwell, simplifying documentation by removing nightly release usage guidance and reorganizing the instruction flow for improved clarity.

…back The deployment skill's NVFP4-on-Blackwell note claimed that if a pinned vLLM release predates the model's arch you must switch to `cu130-nightly-<arch>` (citing Qwen3.5-9B's `qwen3_5`). Verified on a B300 that the pinned release `vllm/vllm-openai:v0.19.1-cu130` loads and serves the Qwen3.5-9B NVFP4 checkpoint (`qwen3_5` / `Qwen3_5ForConditionalGeneration`) fine — /health 200, generates. So the fallback was a wrong inference; the only requirement is the `-cu130` (sm_103 FP4 kernels) build, which the note already states. Mirrors the same removal in the evaluation skill (PR #1649). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

coderabbitai · 2026-06-08T22:42:03Z

📝 Walkthrough

Walkthrough

Updated NVFP4 Blackwell vLLM deployment documentation by removing a sentence advising users to use cu130-nightly-<arch> for older pinned releases, transitioning directly from backend caveats to cross-check recipes.

Changes

vLLM Deployment Documentation

Layer / File(s)	Summary
Remove cu130-nightly guidance `.agents/skills/deployment/SKILL.md`	Removed outdated sentence recommending `cu130-nightly-<arch>` for vLLM releases predating Blackwell sm_103 support, simplifying the documentation flow between kernel caveats and cross-check guidance.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related PRs

NVIDIA/Model-Optimizer#1595: Earlier update to the same vLLM NVFP4-Blackwell deployment documentation around cu130-nightly-<arch> guidance for sm_103 kernel availability.

Suggested reviewers

kaix-nv
mxinO
shengliangxu

🚥 Pre-merge checks | ✅ 6

✅ Passed checks (6 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: removing an incorrect fallback note about cu130 releases from deployment documentation for NVFP4 on Blackwell.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	PR contains only documentation changes to .agents/skills files (.md files); no Python code, modelopt package changes, or dependency additions subject to security check requirements.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch docs/deployment-drop-cu130-arch-fallback

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.agents/skills/deployment/SKILL.md:
- Around line 134-135: The NVFP4 guidance is inconsistent: SKILL.md's
"Cross-check via `recipes.vllm.ai/<org>/<model>?hardware=b300`" section removed
the `cu130-nightly-<arch>` fallback but evaluation examples still reference it;
update SKILL.md and the evaluation/example text to be consistent by either
removing all `cu130-nightly-<arch>` fallback mentions or adding an explicit note
in this section that evaluation docs/examples still reference the fallback and
are being updated in lockstep; search for the literal tokens "Cross-check via
`recipes.vllm.ai/<org>/<model>?hardware=b300`", "cu130-nightly-<arch>", and
"NVFP4" to locate and sync the wording across the skill and evaluation examples.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 37a15036-9714-4636-9db1-13ab001096d5

📥 Commits

Reviewing files that changed from the base of the PR and between 275d45d and 9d9cb36.

📒 Files selected for processing (1)

.agents/skills/deployment/SKILL.md

coderabbitai · 2026-06-08T22:44:42Z

+> Cross-check via
 > `recipes.vllm.ai/<org>/<model>?hardware=b300` (JS-rendered — fetch the raw


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Align cross-skill NVFP4 guidance to avoid conflicting operator instructions.

This section now drops the cu130-nightly-<arch> fallback, but evaluation guidance/examples still include it, so users can get contradictory instructions between skills. Please sync the wording (or add an explicit note here that evaluation docs are being updated in lockstep).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.agents/skills/deployment/SKILL.md around lines 134 - 135, The NVFP4 guidance is inconsistent: SKILL.md's "Cross-check via `recipes.vllm.ai/<org>/<model>?hardware=b300`" section removed the `cu130-nightly-<arch>` fallback but evaluation examples still reference it; update SKILL.md and the evaluation/example text to be consistent by either removing all `cu130-nightly-<arch>` fallback mentions or adding an explicit note in this section that evaluation docs/examples still reference the fallback and are being updated in lockstep; search for the literal tokens "Cross-check via `recipes.vllm.ai/<org>/<model>?hardware=b300`", "cu130-nightly-<arch>", and "NVFP4" to locate and sync the wording across the skill and evaluation examples.

codecov · 2026-06-08T22:50:46Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.22%. Comparing base (16d562a) to head (9d9cb36).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1654   +/-   ##
=======================================
  Coverage   77.22%   77.22%           
=======================================
  Files         493      493           
  Lines       54697    54697           
=======================================
  Hits        42241    42241           
  Misses      12456    12456

Flag	Coverage Δ
unit	`54.08% <ø> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

Removes an incorrect “release predates arch → use cu130-nightly-<arch>” fallback note from the deployment skill’s NVFP4-on-Blackwell (sm_103) guidance, keeping the core recommendation to use the -cu130 vLLM image tag for FP4 kernel support.

Changes:

Deleted the inaccurate cu130-nightly-<arch> fallback guidance (and the Qwen3.5-9B qwen3_5 claim) from the deployment skill note.
Preserved and slightly reflowed the surrounding NVFP4-on-Blackwell guidance, including the recipes cross-check link.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

coderabbitai Bot reviewed Jun 8, 2026

View reviewed changes

Edwardf0t1 requested a review from Copilot June 9, 2026 00:09

Copilot started reviewing on behalf of Edwardf0t1 June 9, 2026 00:09 View session

Copilot AI reviewed Jun 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(deployment skill): drop wrong "release predates arch" cu130 fallback#1654

docs(deployment skill): drop wrong "release predates arch" cu130 fallback#1654
Edwardf0t1 wants to merge 1 commit into
mainfrom
docs/deployment-drop-cu130-arch-fallback

Edwardf0t1 commented Jun 8, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 8, 2026

Uh oh!

codecov Bot commented Jun 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		> Cross-check via
		> `recipes.vllm.ai/<org>/<model>?hardware=b300` (JS-rendered — fetch the raw

Conversation

Edwardf0t1 commented Jun 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Testing

Before your PR is "Ready for review"

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Jun 8, 2026

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Edwardf0t1 commented Jun 8, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading