Skip to content

[CUDNN] Update frontend to version 1.22 and add cuDNN 9.20 path for SM arch >100#2838

Open
zmelumian972 wants to merge 2 commits intoNVIDIA:mainfrom
zmelumian972:cudnn/support_version_1.22
Open

[CUDNN] Update frontend to version 1.22 and add cuDNN 9.20 path for SM arch >100#2838
zmelumian972 wants to merge 2 commits intoNVIDIA:mainfrom
zmelumian972:cudnn/support_version_1.22

Conversation

@zmelumian972
Copy link
Copy Markdown

@zmelumian972 zmelumian972 commented Apr 5, 2026

Summary

  • Updates the cudnn-frontend submodule to version 1.22 (97f6cb3b)
  • Adds a new cuDNN 9.20 path in nvte_get_fused_attn_backend for Blackwell (SM arch >= 100) that supports any head dimension, both forward and backward passes, non-paged layouts, and any sequence length

Changes

  • 3rdparty/cudnn-frontend: Bump submodule from 7b9b711c to 97f6cb3b (cuDNN frontend v1.22)
  • transformer_engine/common/fused_attn/fused_attn.cpp: Add cuDNN 9.20 backend selection condition:
    • Enables FusedAttn_F16_Arbitrary_Seqlen backend for SM >= 100 + cuDNN >= 9.20 + non-paged KV layouts
    • Fixes the logical operator joining the 9.11 condition from && to || to correctly OR the two Blackwell conditions

Test plan

  • Verify FusedAttention with cuDNN 9.20+ on Blackwell (SM >= 100) hardware
  • Confirm existing Hopper (SM 90) paths are unaffected
  • Run fused attention unit tests for paged/non-paged layouts

🤖 Generated with Claude Code

@zmelumian972 zmelumian972 force-pushed the cudnn/support_version_1.22 branch from 704b0fa to d72d8a2 Compare April 5, 2026 16:03
Signed-off-by: zmelumian972 <zmelumian@gmail.com>
Signed-off-by: zmelumian972 <zmelumian@gmail.com>
@zmelumian972 zmelumian972 force-pushed the cudnn/support_version_1.22 branch from d72d8a2 to dcef948 Compare April 5, 2026 16:05
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 5, 2026

Greptile Summary

This PR updates the cudnn-frontend submodule to v1.22 and adds a cuDNN 9.20 backend-selection path in nvte_get_fused_attn_backend for Blackwell GPUs (SM≥100), enabling any head dimension for both forward and backward passes on non-paged KV layouts with any sequence length.

  • 3rdparty/cudnn-frontend: Bumped from 7b9b711c to 97f6cb3b (v1.22) — straightforward dependency update.
  • fused_attn.cpp: A new (sm_arch_ >= 100 && cudnn_runtime_version >= 92000 && layout_group != NVTE_Paged_KV_HD_HD_HD) branch is OR-ed with the existing 9.11 Blackwell-bprop condition. Prior to this PR, Blackwell bprop was restricted to head dims 192/128 only (9.11 path). The 9.20 path opens bprop to any head dimension (multiples of 8) on non-paged Blackwell.
  • The &&|| operator change on the 9.11 line is structurally necessary: it extends the OR-chain inside the head-dim group before the outer closing ) that separates it from the 9.11+ Hopper bug workaround &&. The parenthesisation is correct and the Hopper-specific bug workaround (sm_arch_ == 90) remains unaffected by the new Blackwell path.

Confidence Score: 4/5

PR is safe to merge; the logic change is structurally correct with one edge case worth verifying: sq=1+causal+fprop support in cuDNN 9.20 on Blackwell.

The parenthesis structure and operator fix are correct. The 9.20 condition is intentionally broader than predecessors (no seqlen or head-dim upper bound), consistent with the claimed cuDNN 9.20 capabilities for Blackwell. The only uncertainty is whether sq=1+causal/padding_causal fprop — previously excluded in the 9.10.2 path — is now supported by cuDNN 9.20 on Blackwell. All other downstream constraints (bias, mask type, qkv format, sliding window) still apply.

transformer_engine/common/fused_attn/fused_attn.cpp — verify cuDNN 9.20 supports sq=1+causal+non-paged+fprop on Blackwell before production use.

Important Files Changed

Filename Overview
transformer_engine/common/fused_attn/fused_attn.cpp Adds cuDNN 9.20 backend path for Blackwell (any head_dim, fprop+bprop, non-paged, any seqlen); fixes operator to correctly OR the 9.11 and 9.20 conditions within the head-dim group
3rdparty/cudnn-frontend Submodule bump from 7b9b711c to 97f6cb3b (cuDNN frontend v1.22); straightforward dependency update

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[nvte_get_fused_attn_backend
F16/BF16 path] --> B{Architecture OK?
cudnn>=9.0.7 && sm>=100}
    B -- Yes --> C{head_dim % 8 == 0?}
    B -- No --> X[Other arch/version paths]
    C -- Yes --> D{head_dim <= 128?}
    D -- Yes --> PASS[Head-dim condition met]
    D -- No --> E{cudnn>=9.1/9.5 + Hopper
head_dim <= 256?}
    E -- Yes --> PASS
    E -- No --> F{cudnn>=9.9 + Blackwell
fprop + non-paged + sq>1?}
    F -- Yes --> PASS
    F -- No --> G{cudnn>=9.10.2
fprop + various layouts?}
    G -- Yes --> PASS
    G -- No --> H{cudnn>=9.11 + Blackwell
bprop + non-paged
d_qk=192 d_v=128?}
    H -- Yes --> PASS
    H -- No --> I{cudnn>=9.20 + Blackwell
non-paged + any sq
fprop or bprop NEW}
    I -- Yes --> PASS
    I -- No --> FAIL[No head-dim match]
    PASS --> J{Hopper 9.11+ bprop
bug workaround
sm==90 check}
    J -- Blocked --> FAIL
    J -- OK --> K[bias / mask / format /
sliding-window checks]
    K --> BACKEND[FusedAttn_F16_Arbitrary_Seqlen]
Loading

Reviews (1): Last reviewed commit: "FusedAttention: Add cudnn 9.20 path for ..." | Re-trigger Greptile

Comment on lines +343 to +345
// 9.20: any head_dim + Blackwell + fprop/bprop + non_paged + any sq
(sm_arch_ >= 100 && cudnn_runtime_version >= 92000 &&
layout_group != NVTE_QKV_Layout_Group::NVTE_Paged_KV_HD_HD_HD)) &&
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Verify sq=1 + causal/padding_causal fprop support in cuDNN 9.20

The 9.20 condition allows any max_seqlen_q (including sq = 1) with any mask type on non-paged Blackwell layouts. The preceding 9.10.2 fprop path explicitly excluded sq = 1 + causal and sq = 1 + padding_causal on non-paged layouts:

(max_seqlen_q == 1 && attn_mask_type != NVTE_Mask_Type::NVTE_CAUSAL_MASK &&
 attn_mask_type != NVTE_Mask_Type::NVTE_PADDING_CAUSAL_MASK)

With the 9.20 path (any sq, no mask-type restriction at the head-dim level), sq=1 + causal + non-paged + fprop on Blackwell/cuDNN≥9.20 will now pass this gate — where it was previously blocked. If cuDNN 9.20 lifts this restriction for SM≥100, this is correct. If not, passing this combination to the backend would produce a runtime error. Please confirm whether cuDNN 9.20 actually supports this combination on Blackwell.

@KshitijLakhani
Copy link
Copy Markdown
Collaborator

/te-ci jax L0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants