Add support for SWA (left, right) with FusedAttention #2477

sudhakarsingh27 · 2025-12-04T00:54:57Z

Description

FusedAttention supports "right" side sliding window attention for some time now. This adds support for SWA (left, right) with FusedAttention backend in TE.
(changes cherry-picked from original PR: #1369)

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

transformer_engine

common
- fused_attn
  - fused_attn.cpp
    - add bottom_right_diagonal parameter to the API
    - Edit the filters to allow sliding window config to pick arbitrary seqlen fused attn backend
  - fused_attn_f16_arbitrary_seqlen.cu: add bottom_right_diagonal parameter to the API
  - fused_attn_fp8.cu: add bottom_right_diagonal parameter to the FADescriptor_v1 API
  - utils.h: add bottom_right_diagonal parameter to FADescriptor_v1 API
pytorch
- transformer.py
  - plumb bottom_right_diagonal through the call stack: TransformerLayer --> SelfAttention/CrossAttention
- attention
  - dot_product_attention
    - backends.py:
      - UnfusedDotProductAttention
        
        add bottom_right_diagonal parameter to the forward API
        
        why is it not used in the forward?
        
        bottom_right_alignment is being used in the Alibi call, perhaps this should be corrected
      - FusedAttn custom module
        
        add bottom_right_diagonal parameter to the forward API
      - FusedAttention module
        
        plumb bottom_right_diagonal through the call stack
    - dot_product_attention.py
      - DotProductAttention
        
        Plumb bottom_right_diagonal through the call stack
        
        Add calculation of bottom_right_diagonal if it's None
    - utils.py
      - AttentionParams
        
        [x]
      - get_attention_backend
        
        update sliding window filter section
        
        update attention bias filter section
  - multi_head_attention.py
    - Add bottom_right_diagonal to forward API and call
    - Add calculation of bottom_right_diagonal if it's None
- cpp_extentions
  - fused_attn.py
    - plumb bottom_right_diagonal in fused_attn_fwd/fused_attn_bwd
- csrc
  - extension
    - attention.cpp
      - plumb bottom_right_diagonal through the call stack: fused_attn_fwd --> nvte_fused_attn_fwd
      - same as above for bwd
  - extensions.h
    - add bottom_right_diagonal to fused_attn_fwd and fused_attn_bwd API definitions

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

…IA#1369 Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

for more information, see https://pre-commit.ci

sudhakarsingh27 · 2025-12-04T00:56:01Z

/te-ci pytorch L0

greptile-apps · 2025-12-04T01:04:01Z

Greptile Summary

This PR adds support for left-right sliding window attention (SWA) to FusedAttention by introducing the bottom_right_diagonal parameter throughout the stack. The implementation properly threads this parameter from Python APIs through C++ extensions to CUDA kernels, enabling cuDNN to configure diagonal alignment for SWA.

Key changes:

Added bottom_right_diagonal field to FADescriptor_v1 struct
Updated backend selection filters to support SWA with arbitrary seqlen
Implemented diagonal alignment configuration in cuDNN graphs using set_diagonal_alignment and set_diagonal_band_right_bound
Expanded test coverage for SWA with multiple mask types and layouts

Critical issues identified:

FP8 path: Functions fused_attn_fp8_fwd_impl_v1 and fused_attn_fp8_bwd_impl_v1 hardcode bottom_right_diagonal=true instead of accepting it as a parameter, breaking configurability for FP8 attention
Backend selection bug: Lines 911 and 938 in utils.py incorrectly set use_flash_attention = False instead of use_flash_attention_2 = False, which disables all FlashAttention backends when only FlashAttention 2 should be disabled

Confidence Score: 3/5

This PR has correct F16 implementation but contains bugs in FP8 path and backend selection logic
The F16 arbitrary seqlen implementation is correct and comprehensive with proper test coverage. However, two critical bugs significantly impact functionality: (1) FP8 attention path hardcodes bottom_right_diagonal=true, preventing users from configuring this for FP8 operations, and (2) variable name typos in backend selection incorrectly disable all FlashAttention variants instead of just v2. These bugs don't affect the main F16 path but create inconsistent behavior across backends.
Pay close attention to transformer_engine/common/fused_attn/fused_attn_fp8.cu (FP8 hardcoded values) and transformer_engine/pytorch/attention/dot_product_attention/utils.py (variable name typos)

Important Files Changed

Filename	Overview
transformer_engine/common/fused_attn/fused_attn.cpp	Properly threads `bottom_right_diagonal` parameter through all fused attention APIs and updates backend filters for SWA support
transformer_engine/common/fused_attn/fused_attn_f16_arbitrary_seqlen.cu	Correctly implements `bottom_right_diagonal` parameter in cuDNN graphs using `set_diagonal_alignment` and `set_diagonal_band_right_bound`
transformer_engine/common/fused_attn/fused_attn_fp8.cu	FP8 functions hardcode `bottom_right_diagonal` to `true` instead of accepting it as parameter - breaks configurability
transformer_engine/pytorch/attention/dot_product_attention/utils.py	Variable name typos on lines 911 and 938 incorrectly disable all FlashAttention instead of just FlashAttention 2
tests/pytorch/attention/test_attention.py	Adds comprehensive test coverage for SWA with multiple mask types and layouts (thd, sbhd)

Sequence Diagram

sequenceDiagram
    participant User
    participant TransformerLayer
    participant MHA as MultiheadAttention
    participant DPA as DotProductAttention
    participant Backend as FusedAttention Backend
    participant CPP as C++ Extensions
    participant CUDA as CUDA Kernels
    
    User->>TransformerLayer: forward(bottom_right_diagonal)
    TransformerLayer->>TransformerLayer: Set defaults based on mask_type
    TransformerLayer->>MHA: forward(bottom_right_diagonal)
    MHA->>MHA: Apply mask_type logic
    MHA->>DPA: forward(bottom_right_diagonal)
    DPA->>DPA: Calculate default if None
    DPA->>Backend: get_attention_backend(bottom_right_diagonal)
    Backend->>Backend: Filter backends by SWA support
    Backend-->>DPA: Selected backend
    DPA->>CPP: fused_attn_fwd(bottom_right_diagonal)
    CPP->>CUDA: nvte_fused_attn_fwd(bottom_right_diagonal)
    
    alt F16 Arbitrary Seqlen
        CUDA->>CUDA: fused_attn_arbitrary_seqlen_fwd
        CUDA->>CUDA: Set diagonal_alignment (TOP_LEFT/BOTTOM_RIGHT)
        CUDA->>CUDA: Set diagonal_band_right_bound if needed
        CUDA->>CUDA: Build cuDNN graph with SWA config
    else FP8 Path
        CUDA->>CUDA: fused_attn_fp8_fwd_impl_v1
        Note over CUDA: Hardcoded bottom_right_diagonal=true
        CUDA->>CUDA: FADescriptor_v1 with hardcoded value
    end
    
    CUDA-->>CPP: Attention output
    CPP-->>DPA: Output tensor
    DPA-->>MHA: Output tensor
    MHA-->>TransformerLayer: Output tensor
    TransformerLayer-->>User: Final output

greptile-apps

Additional Comments (2)

transformer_engine/pytorch/attention/dot_product_attention/dot_product_attention.py, line 1281 (link)

logic: Trailing comma creates single-element tuple instead of boolean - should this be just bottom_right_alignment = attn_mask_type not in ["causal", "padding_causal"]?
transformer_engine/pytorch/attention/dot_product_attention/dot_product_attention.py, line 1482 (link)

style: Uses hardcoded mask type check instead of the new bottom_right_diagonal parameter for ALiBi alignment. Should this use bottom_right_diagonal parameter for consistency?

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

_{15 files reviewed, 8 comments}

_{Edit Code Review Agent Settings | Greptile}

transformer_engine/pytorch/attention/dot_product_attention/backends.py

transformer_engine/pytorch/attention/dot_product_attention/utils.py

greptile-apps · 2025-12-04T01:04:00Z

transformer_engine/pytorch/transformer.py

+        if self_attn_mask_type in {"causal", "padding_causal"}:
+            bottom_right_diagonal = False
+        if bottom_right_diagonal is None or self_attn_mask_type in {
+            "causal_bottom_right",
+            "padding_causal_bottom_right",
+        }:
+            bottom_right_diagonal = True


logic: Logic overrides the instance variable even when explicitly set in forward call - should preserve user's explicit choice. Should the mask type check override an explicitly passed bottom_right_diagonal parameter, or only apply when it's None?

greptile-apps · 2025-12-04T01:04:01Z

transformer_engine/pytorch/transformer.py

+        if enc_dec_attn_mask_type in {"causal", "padding_causal"}:
+            enc_dec_bottom_right_diagonal = False
+        if enc_dec_bottom_right_diagonal is None or enc_dec_attn_mask_type in {
+            "causal_bottom_right",
+            "padding_causal_bottom_right",
+        }:
+            enc_dec_bottom_right_diagonal = True


logic: Same logic issue as above - mask type check overrides explicit parameter values

transformer_engine/common/fused_attn/fused_attn.cpp

tests/pytorch/attention/test_attention.py

transformer_engine/common/fused_attn/fused_attn.cpp

transformer_engine/common/fused_attn/fused_attn_f16_arbitrary_seqlen.cu

transformer_engine/pytorch/attention/dot_product_attention/backends.py

transformer_engine/pytorch/attention/dot_product_attention/utils.py

Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

…adding_brcm_try2

greptile-apps

_{15 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

…adding_brcm_try2

greptile-apps

Additional Comments (4)

transformer_engine/pytorch/transformer.py, line 777-783 (link)

logic: Conditional logic overwrites explicitly passed parameter. When self_attn_mask_type in {"causal", "padding_causal"} at line 777, this unconditionally sets bottom_right_diagonal = False, even if the user explicitly passed a different value in the forward call. The check at line 779 (if bottom_right_diagonal is None) becomes unreachable in those cases.

Consider restructuring to only apply defaults when bottom_right_diagonal is None:
transformer_engine/pytorch/transformer.py, line 787-793 (link)

logic: Same conditional logic issue for encoder-decoder attention. The mask type check at line 787 unconditionally overwrites enc_dec_bottom_right_diagonal, ignoring explicitly passed values.
transformer_engine/pytorch/attention/dot_product_attention/utils.py, line 920 (link)

logic: Variable name mismatch: setting use_flash_attention instead of use_flash_attention_2
transformer_engine/pytorch/attention/dot_product_attention/utils.py, line 947 (link)

logic: Variable name mismatch: setting use_flash_attention instead of use_flash_attention_2

_{15 files reviewed, 4 comments}

_{Edit Code Review Agent Settings | Greptile}

sudhakarsingh27 · 2025-12-10T23:33:18Z

/te-ci pytorch L0

sudhakarsingh27 · 2025-12-11T18:19:15Z

/te-ci pytorch L0

greptile-apps

Additional Comments (1)

transformer_engine/pytorch/attention/dot_product_attention/dot_product_attention.py, line 1283 (link)

syntax: Tuple has trailing comma causing it to be a single-element tuple instead of boolean

_{15 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

sudhakarsingh27 · 2025-12-11T21:25:46Z

/te-ci pytorch L0

greptile-apps

_{15 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

sudhakarsingh27 · 2025-12-11T23:19:25Z

/te-ci pytorch L0

Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

sudhakarsingh27 · 2026-01-16T19:08:07Z

/te-ci pytorch L1

greptile-apps

_{17 files reviewed, 4 comments}

_{Edit Code Review Agent Settings | Greptile}

transformer_engine/pytorch/attention/dot_product_attention/utils.py

transformer_engine/common/fused_attn/fused_attn_fp8.cu

cyanguwa · 2026-01-18T22:48:37Z

You may need to add the extra argument bottom_right_diagonal to the nvte_xxx C API calls in Jax too, otherwise TE-Jax won't build. Have a look at PR 2584 maybe, where the deterministic arg was added, so files such as jax/csrc/extensions/attention.cpp need to be changed as well. Thanks!

Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

…adding_brcm_try2

for more information, see https://pre-commit.ci

sudhakarsingh27 · 2026-01-20T21:18:57Z

/te-ci L1

Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

…ansformerEngine into swa_padding_brcm_try2

…adding_brcm_try2

for more information, see https://pre-commit.ci

greptile-apps

_{20 files reviewed, 4 comments}

_{Edit Code Review Agent Settings | Greptile}

transformer_engine/pytorch/attention/dot_product_attention/utils.py

transformer_engine/common/fused_attn/fused_attn_fp8.cu

sudhakarsingh27 · 2026-01-21T21:02:44Z

/te-ci L1

Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

…ansformerEngine into swa_padding_brcm_try2

… swa_padding_brcm_try2

sudhakarsingh27 · 2026-01-22T00:05:14Z

/te-ci jax L1

greptile-apps

_{20 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-22T00:12:11Z

transformer_engine/pytorch/attention/dot_product_attention/utils.py

+                "Disabling FlashAttention as it only supports sliding window with bottom right"
+                " diagonal alignment for cross-attention"
+            )
+            use_flash_attention = False


syntax: wrong variable - should be use_flash_attention_2 = False

this disables all FlashAttention backends when only FlashAttention 2 should be disabled for sliding window with cross-attention

Suggested change

use_flash_attention = False

use_flash_attention_2 = False

greptile-apps · 2026-01-22T00:12:12Z

transformer_engine/pytorch/attention/dot_product_attention/utils.py

+                    "Disabling FlashAttention as it only supports ALiBi with bottom right diagonal"
+                    " alignment for cross-attention"
+                )
+                use_flash_attention = False


syntax: wrong variable - should be use_flash_attention_2 = False

this disables all FlashAttention backends when only FlashAttention 2 should be disabled for ALiBi with cross-attention

Suggested change

use_flash_attention = False

use_flash_attention_2 = False

Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

…adding_brcm_try2

sudhakarsingh27 · 2026-01-22T04:48:16Z

/te-ci jax L1

for more information, see https://pre-commit.ci

greptile-apps

Additional Comments (2)

transformer_engine/common/fused_attn/fused_attn_fp8.cu, line 2480-2486 (link)

logic: The bottom_right_diagonal parameter is not being passed to fused_attn_fp8_fwd_impl_v1, even though it was added to the higher-level API. This causes the function to use hardcoded values (line 1709: true) instead of respecting the user-specified configuration.

Add bottom_right_diagonal parameter to the function call and update the function signature to accept it.
transformer_engine/common/fused_attn/fused_attn_fp8.cu, line 2575-2580 (link)

logic: The bottom_right_diagonal parameter is not being passed to fused_attn_fp8_bwd_impl_v1. Add it to the function call to match the forward pass fix.

_{20 files reviewed, 6 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-22T04:53:22Z

transformer_engine/pytorch/attention/dot_product_attention/utils.py

+                "Disabling FlashAttention as it only supports sliding window with bottom right"
+                " diagonal alignment for cross-attention"
+            )
+            use_flash_attention = False


syntax: wrong variable - should be use_flash_attention_2 = False

Suggested change

use_flash_attention = False

use_flash_attention_2 = False

greptile-apps · 2026-01-22T04:53:23Z

transformer_engine/pytorch/attention/dot_product_attention/utils.py

+                    "Disabling FlashAttention as it only supports ALiBi with bottom right diagonal"
+                    " alignment for cross-attention"
+                )
+                use_flash_attention = False


syntax: wrong variable - should be use_flash_attention_2 = False

Suggested change

use_flash_attention = False

use_flash_attention_2 = False

greptile-apps · 2026-01-22T04:53:24Z

transformer_engine/common/fused_attn/fused_attn_fp8.cu

                               true,
+                               true,


logic: hardcoded bottom_right_diagonal value (position 9 in FADescriptor_v1 initialization) bypasses the parameter mechanism. Accept bottom_right_diagonal as a function parameter and use it here instead of hardcoding true.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

greptile-apps · 2026-01-22T04:53:24Z

transformer_engine/common/fused_attn/fused_attn_fp8.cu

+                               true,
                               false,


logic: hardcoded bottom_right_diagonal value in backward pass. Accept as parameter and pass through instead of hardcoding true.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

sudhakarsingh27 · 2026-01-22T17:18:06Z

/te-ci L1

sudhakarsingh27 · 2026-01-22T17:50:06Z

/te-ci L1

cyanguwa

Looks good to me, thanks!

* SWA (left, right) with FusedAttention changes cherry-picked from #1369 Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix test_kv_cache failures Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * remove unnecessary comments Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * fix some more filter issues, address feedback Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * fix for local test case failures - `bottom_right_diagonal` should be calculated in `fused_attn_fwd` call as well Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * make conditions more accurate Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * add cp tests to test swa (left, right) Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dead code and make conditions better Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feedback form Charlene Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * small er Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * plumb `bottom_right_diagonal` through jax Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * plumb `bottom_right_diagonal` through jax Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add missing fields Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * use proper mask type in CP Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

SWA (left, right) with FusedAttention changes cherry-picked from NVID…

33e2e47

…IA#1369 Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

sudhakarsingh27 requested a review from cyanguwa December 4, 2025 00:55

[pre-commit.ci] auto fixes from pre-commit.com hooks

eab24be

for more information, see https://pre-commit.ci

cyanguwa mentioned this pull request Dec 4, 2025

[common/PyTorch] Add FusedAttention support for SWA (left, right) #1369

Closed

13 tasks

greptile-apps bot reviewed Dec 4, 2025

View reviewed changes

cyanguwa reviewed Dec 5, 2025

View reviewed changes

transformer_engine/common/fused_attn/fused_attn.cpp Show resolved Hide resolved

cyanguwa reviewed Dec 5, 2025

View reviewed changes

tests/pytorch/attention/test_attention.py Show resolved Hide resolved

cyanguwa reviewed Dec 5, 2025

View reviewed changes

transformer_engine/common/fused_attn/fused_attn.cpp Outdated Show resolved Hide resolved

cyanguwa reviewed Dec 5, 2025

View reviewed changes

transformer_engine/common/fused_attn/fused_attn_f16_arbitrary_seqlen.cu Outdated Show resolved Hide resolved

cyanguwa reviewed Dec 5, 2025

View reviewed changes

transformer_engine/common/fused_attn/fused_attn_f16_arbitrary_seqlen.cu Outdated Show resolved Hide resolved

cyanguwa reviewed Dec 5, 2025

View reviewed changes

transformer_engine/pytorch/attention/dot_product_attention/backends.py Outdated Show resolved Hide resolved

cyanguwa reviewed Dec 5, 2025

View reviewed changes

transformer_engine/pytorch/attention/dot_product_attention/utils.py Outdated Show resolved Hide resolved

cyanguwa reviewed Dec 5, 2025

View reviewed changes

transformer_engine/pytorch/attention/dot_product_attention/utils.py Outdated Show resolved Hide resolved

sudhakarsingh27 added the 2.11.0 label Dec 6, 2025

sudhakarsingh27 added 2 commits December 5, 2025 16:53

fix test_kv_cache failures

e761a26

Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

Merge branch 'main' of github.com:NVIDIA/TransformerEngine into swa_p…

48e4f4d

…adding_brcm_try2

greptile-apps bot reviewed Dec 6, 2025

View reviewed changes

Merge branch 'main' of github.com:NVIDIA/TransformerEngine into swa_p…

172ebbe

…adding_brcm_try2

greptile-apps bot reviewed Dec 10, 2025

View reviewed changes

greptile-apps bot reviewed Dec 11, 2025

View reviewed changes

remove unnecessary comments

93548fc

Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

fix some more filter issues, address feedback

a545ebf

Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

sudhakarsingh27 force-pushed the swa_padding_brcm_try2 branch from a56c4d2 to a545ebf Compare December 12, 2025 20:52

greptile-apps bot reviewed Jan 16, 2026

View reviewed changes

sudhakarsingh27 and others added 3 commits January 20, 2026 11:37

plumb bottom_right_diagonal through jax

97071bf

Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

Merge branch 'main' of github.com:NVIDIA/TransformerEngine into swa_p…

f179baf

…adding_brcm_try2

[pre-commit.ci] auto fixes from pre-commit.com hooks

5ba7c0b

for more information, see https://pre-commit.ci

sudhakarsingh27 and others added 4 commits January 21, 2026 12:45

plumb bottom_right_diagonal through jax

157326e

Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

Merge branch 'swa_padding_brcm_try2' of github.com:sudhakarsingh27/Tr…

e017aa8

…ansformerEngine into swa_padding_brcm_try2

Merge branch 'main' of github.com:NVIDIA/TransformerEngine into swa_p…

139b325

…adding_brcm_try2

[pre-commit.ci] auto fixes from pre-commit.com hooks

f10cefe

for more information, see https://pre-commit.ci

greptile-apps bot reviewed Jan 21, 2026

View reviewed changes

sudhakarsingh27 added 3 commits January 21, 2026 16:03

add missing fields

d199d65

Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

Merge branch 'swa_padding_brcm_try2' of github.com:sudhakarsingh27/Tr…

8ce00c7

…ansformerEngine into swa_padding_brcm_try2

git puMerge branch 'main' of github.com:NVIDIA/TransformerEngine into…

07e353a

… swa_padding_brcm_try2

greptile-apps bot reviewed Jan 22, 2026

View reviewed changes

sudhakarsingh27 added 2 commits January 21, 2026 20:47

use proper mask type in CP

86d4144

Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

Merge branch 'main' of github.com:NVIDIA/TransformerEngine into swa_p…

5760ea9

…adding_brcm_try2

[pre-commit.ci] auto fixes from pre-commit.com hooks

5993cea

for more information, see https://pre-commit.ci

greptile-apps bot reviewed Jan 22, 2026

View reviewed changes

Merge branch 'main' into swa_padding_brcm_try2

a157ab9

cyanguwa approved these changes Jan 22, 2026

View reviewed changes

cyanguwa merged commit c6a92a4 into NVIDIA:main Jan 22, 2026
45 of 54 checks passed

sudhakarsingh27 mentioned this pull request Jan 26, 2026

[PyTorch/common] SWA(left, right) support via cuDNN #2493

Closed

Add support for SWA (left, right) with FusedAttention #2477

Add support for SWA (left, right) with FusedAttention #2477

Uh oh!

Conversation

sudhakarsingh27 commented Dec 4, 2025

Description

Type of change

Changes

Checklist:

Uh oh!

sudhakarsingh27 commented Dec 4, 2025

Uh oh!

greptile-apps bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Additional Comments (2)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Additional Comments (4)

Uh oh!

sudhakarsingh27 commented Dec 10, 2025

Uh oh!

sudhakarsingh27 commented Dec 11, 2025

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Additional Comments (1)

Uh oh!

sudhakarsingh27 commented Dec 11, 2025

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

sudhakarsingh27 commented Dec 11, 2025

Uh oh!

sudhakarsingh27 commented Jan 16, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cyanguwa commented Jan 18, 2026

Uh oh!

sudhakarsingh27 commented Jan 20, 2026

Uh oh!

greptile-apps bot left a comment

greptile-apps bot commented Dec 4, 2025 •

edited

Loading

greptile-apps bot left a comment •

edited

Loading

greptile-apps bot left a comment •

edited

Loading

greptile-apps bot left a comment •

edited

Loading

greptile-apps bot left a comment •

edited

Loading