docs(deployment skill): drop wrong "release predates arch" cu130 fallback#1654
docs(deployment skill): drop wrong "release predates arch" cu130 fallback#1654Edwardf0t1 wants to merge 1 commit into
Conversation
…back The deployment skill's NVFP4-on-Blackwell note claimed that if a pinned vLLM release predates the model's arch you must switch to `cu130-nightly-<arch>` (citing Qwen3.5-9B's `qwen3_5`). Verified on a B300 that the pinned release `vllm/vllm-openai:v0.19.1-cu130` loads and serves the Qwen3.5-9B NVFP4 checkpoint (`qwen3_5` / `Qwen3_5ForConditionalGeneration`) fine — /health 200, generates. So the fallback was a wrong inference; the only requirement is the `-cu130` (sm_103 FP4 kernels) build, which the note already states. Mirrors the same removal in the evaluation skill (PR #1649). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
📝 WalkthroughWalkthroughUpdated NVFP4 Blackwell vLLM deployment documentation by removing a sentence advising users to use ChangesvLLM Deployment Documentation
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.agents/skills/deployment/SKILL.md:
- Around line 134-135: The NVFP4 guidance is inconsistent: SKILL.md's
"Cross-check via `recipes.vllm.ai/<org>/<model>?hardware=b300`" section removed
the `cu130-nightly-<arch>` fallback but evaluation examples still reference it;
update SKILL.md and the evaluation/example text to be consistent by either
removing all `cu130-nightly-<arch>` fallback mentions or adding an explicit note
in this section that evaluation docs/examples still reference the fallback and
are being updated in lockstep; search for the literal tokens "Cross-check via
`recipes.vllm.ai/<org>/<model>?hardware=b300`", "cu130-nightly-<arch>", and
"NVFP4" to locate and sync the wording across the skill and evaluation examples.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 37a15036-9714-4636-9db1-13ab001096d5
📒 Files selected for processing (1)
.agents/skills/deployment/SKILL.md
| > Cross-check via | ||
| > `recipes.vllm.ai/<org>/<model>?hardware=b300` (JS-rendered — fetch the raw |
There was a problem hiding this comment.
Align cross-skill NVFP4 guidance to avoid conflicting operator instructions.
This section now drops the cu130-nightly-<arch> fallback, but evaluation guidance/examples still include it, so users can get contradictory instructions between skills. Please sync the wording (or add an explicit note here that evaluation docs are being updated in lockstep).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.agents/skills/deployment/SKILL.md around lines 134 - 135, The NVFP4
guidance is inconsistent: SKILL.md's "Cross-check via
`recipes.vllm.ai/<org>/<model>?hardware=b300`" section removed the
`cu130-nightly-<arch>` fallback but evaluation examples still reference it;
update SKILL.md and the evaluation/example text to be consistent by either
removing all `cu130-nightly-<arch>` fallback mentions or adding an explicit note
in this section that evaluation docs/examples still reference the fallback and
are being updated in lockstep; search for the literal tokens "Cross-check via
`recipes.vllm.ai/<org>/<model>?hardware=b300`", "cu130-nightly-<arch>", and
"NVFP4" to locate and sync the wording across the skill and evaluation examples.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1654 +/- ##
=======================================
Coverage 77.22% 77.22%
=======================================
Files 493 493
Lines 54697 54697
=======================================
Hits 42241 42241
Misses 12456 12456
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Removes an incorrect “release predates arch → use cu130-nightly-<arch>” fallback note from the deployment skill’s NVFP4-on-Blackwell (sm_103) guidance, keeping the core recommendation to use the -cu130 vLLM image tag for FP4 kernel support.
Changes:
- Deleted the inaccurate
cu130-nightly-<arch>fallback guidance (and the Qwen3.5-9Bqwen3_5claim) from the deployment skill note. - Preserved and slightly reflowed the surrounding NVFP4-on-Blackwell guidance, including the recipes cross-check link.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
What does this PR do?
Type of change: Documentation (agent skill)
Removes an incorrect fallback note from the deployment skill's NVFP4-on-Blackwell (sm_103) guidance. The note claimed:
This is a wrong inference. I verified on a B300 that the pinned release
vllm/vllm-openai:v0.19.1-cu130loads and serves the Qwen3.5-9B NVFP4 checkpoint (qwen3_5/Qwen3_5ForConditionalGeneration) cleanly —/health200, lists the model, generates. So the release does not predateqwen3_5; the only real requirement is the-cu130build (sm_103 FP4 kernels), which the note already states and this PR keeps.This mirrors the same removal in the evaluation skill (#1649) — that PR didn't touch the deployment skill, so this is the companion one-liner.
Testing
Empirical: served the checkpoint with
v0.19.1-cu130on aws-pdx (B300) → engine init succeeds,/v1/modelslists it, chat completion returns text. Docs-only change otherwise.Before your PR is "Ready for review"
🤖 Generated with Claude Code
Summary by CodeRabbit