[CUDNN] Update frontend to version 1.22 and add cuDNN 9.20 path for SM arch >100#2838
[CUDNN] Update frontend to version 1.22 and add cuDNN 9.20 path for SM arch >100#2838zmelumian972 wants to merge 2 commits intoNVIDIA:mainfrom
Conversation
704b0fa to
d72d8a2
Compare
Signed-off-by: zmelumian972 <zmelumian@gmail.com>
Signed-off-by: zmelumian972 <zmelumian@gmail.com>
d72d8a2 to
dcef948
Compare
Greptile SummaryThis PR updates the
Confidence Score: 4/5PR is safe to merge; the logic change is structurally correct with one edge case worth verifying: sq=1+causal+fprop support in cuDNN 9.20 on Blackwell. The parenthesis structure and operator fix are correct. The 9.20 condition is intentionally broader than predecessors (no seqlen or head-dim upper bound), consistent with the claimed cuDNN 9.20 capabilities for Blackwell. The only uncertainty is whether sq=1+causal/padding_causal fprop — previously excluded in the 9.10.2 path — is now supported by cuDNN 9.20 on Blackwell. All other downstream constraints (bias, mask type, qkv format, sliding window) still apply. transformer_engine/common/fused_attn/fused_attn.cpp — verify cuDNN 9.20 supports sq=1+causal+non-paged+fprop on Blackwell before production use. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[nvte_get_fused_attn_backend
F16/BF16 path] --> B{Architecture OK?
cudnn>=9.0.7 && sm>=100}
B -- Yes --> C{head_dim % 8 == 0?}
B -- No --> X[Other arch/version paths]
C -- Yes --> D{head_dim <= 128?}
D -- Yes --> PASS[Head-dim condition met]
D -- No --> E{cudnn>=9.1/9.5 + Hopper
head_dim <= 256?}
E -- Yes --> PASS
E -- No --> F{cudnn>=9.9 + Blackwell
fprop + non-paged + sq>1?}
F -- Yes --> PASS
F -- No --> G{cudnn>=9.10.2
fprop + various layouts?}
G -- Yes --> PASS
G -- No --> H{cudnn>=9.11 + Blackwell
bprop + non-paged
d_qk=192 d_v=128?}
H -- Yes --> PASS
H -- No --> I{cudnn>=9.20 + Blackwell
non-paged + any sq
fprop or bprop NEW}
I -- Yes --> PASS
I -- No --> FAIL[No head-dim match]
PASS --> J{Hopper 9.11+ bprop
bug workaround
sm==90 check}
J -- Blocked --> FAIL
J -- OK --> K[bias / mask / format /
sliding-window checks]
K --> BACKEND[FusedAttn_F16_Arbitrary_Seqlen]
Reviews (1): Last reviewed commit: "FusedAttention: Add cudnn 9.20 path for ..." | Re-trigger Greptile |
| // 9.20: any head_dim + Blackwell + fprop/bprop + non_paged + any sq | ||
| (sm_arch_ >= 100 && cudnn_runtime_version >= 92000 && | ||
| layout_group != NVTE_QKV_Layout_Group::NVTE_Paged_KV_HD_HD_HD)) && |
There was a problem hiding this comment.
Verify
sq=1 + causal/padding_causal fprop support in cuDNN 9.20
The 9.20 condition allows any max_seqlen_q (including sq = 1) with any mask type on non-paged Blackwell layouts. The preceding 9.10.2 fprop path explicitly excluded sq = 1 + causal and sq = 1 + padding_causal on non-paged layouts:
(max_seqlen_q == 1 && attn_mask_type != NVTE_Mask_Type::NVTE_CAUSAL_MASK &&
attn_mask_type != NVTE_Mask_Type::NVTE_PADDING_CAUSAL_MASK)With the 9.20 path (any sq, no mask-type restriction at the head-dim level), sq=1 + causal + non-paged + fprop on Blackwell/cuDNN≥9.20 will now pass this gate — where it was previously blocked. If cuDNN 9.20 lifts this restriction for SM≥100, this is correct. If not, passing this combination to the backend would produce a runtime error. Please confirm whether cuDNN 9.20 actually supports this combination on Blackwell.
|
/te-ci jax L0 |
Summary
cudnn-frontendsubmodule to version 1.22 (97f6cb3b)nvte_get_fused_attn_backendfor Blackwell (SM arch >= 100) that supports any head dimension, both forward and backward passes, non-paged layouts, and any sequence lengthChanges
3rdparty/cudnn-frontend: Bump submodule from7b9b711cto97f6cb3b(cuDNN frontend v1.22)transformer_engine/common/fused_attn/fused_attn.cpp: Add cuDNN 9.20 backend selection condition:FusedAttn_F16_Arbitrary_Seqlenbackend for SM >= 100 + cuDNN >= 9.20 + non-paged KV layouts&&to||to correctly OR the two Blackwell conditionsTest plan
🤖 Generated with Claude Code