Skip to content

[CUDA] refactor fatbinary to use --image3#21054

Merged
KornevNikita merged 1 commit intointel:syclfrom
bratpiorka:rrudnick_cuda_fatbin
Jan 29, 2026
Merged

[CUDA] refactor fatbinary to use --image3#21054
KornevNikita merged 1 commit intointel:syclfrom
bratpiorka:rrudnick_cuda_fatbin

Conversation

@bratpiorka
Copy link
Copy Markdown
Contributor

Summary

This change updates clang-linker-wrapper to invoke NVIDIA fatbinary using the --image3=... image specification format (kind + SM + file) instead of the legacy --image=profile=... format, which is no longer supported in CUDA Toolkit version 13 and later.
Since newer CUDA toolkits have removed the legacy --image option, clang-linker-wrapper must use --image3 to maintain compatibility.

Changes

  • Refactored NVPTX fatbin construction to generate fatbinary arguments in the following form:
--image3=kind=ptx,sm=<N>,file=<ptx>
--image3=kind=elf,sm=<N>,file=<cubin>
  • Updated the fatbinary helper to consume OffloadingImage objects directly.
  • Adjusted driver tests to accept both legacy and --image3 formats where applicable.

NOTE: upstream already has this changed implemented in: llvm/llvm-project@79d8a26

@bratpiorka bratpiorka marked this pull request as ready for review January 15, 2026 09:03
@bratpiorka bratpiorka requested review from a team as code owners January 15, 2026 09:03
@bratpiorka
Copy link
Copy Markdown
Contributor Author

@intel/dpcpp-clang-driver-reviewers @intel/dpcpp-tools-reviewers fails on BMG are not related with my change

Comment thread clang/test/Driver/clang-linker-wrapper.cpp Outdated
Comment thread clang/test/Driver/clang-linker-wrapper.cpp Outdated
Comment thread clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
// CHK-CMDS-AOT-NV-NEXT: clang{{.*}} -o [[CLANGOUT:.*]] -dumpdir a.out.nvptx64.sm_50.img. --target=nvptx64-nvidia-cuda -march={{.*}}
// CHK-CMDS-AOT-NV-NEXT: ptxas{{.*}} --output-file [[PTXASOUT:.*]] [[CLANGOUT]]
// CHK-CMDS-AOT-NV-NEXT: fatbinary{{.*}} --create [[FATBINOUT:.*]] --image=profile={{.*}},file=[[CLANGOUT]] --image=profile={{.*}},file=[[PTXASOUT]]
// CHK-CMDS-AOT-NV-NEXT: fatbinary{{.*}} --create [[FATBINOUT:.*]] --image3=kind=ptx,sm={{(compute_)?50}},file=[[CLANGOUT]] --image3=kind=elf,sm=50,file=[[PTXASOUT]]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is, by specifying compute_*, you can compile code for a range of NVidia GPUs and produce PTX files, while specifying sm_* will generate machine code for the exact GPU specified.
Could you please explain this check? sm={{(compute_)?50}}
This will match sm=50 or sm=compute_50
It makes sense to set the kind to PTX only when sm=compute_50 but the regex here would set it for both sm=50 and sm=compute_50.
Please correct if this is not the case.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simplified the patterns, and now I’m only looking for sm=50, which works well - I checked both CUDA 12.6 and CUDA 13 toolkits

Copy link
Copy Markdown
Contributor

@srividya-sundaram srividya-sundaram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Driver specific changes LGTM

@bratpiorka bratpiorka marked this pull request as draft January 21, 2026 09:06
@bratpiorka bratpiorka marked this pull request as ready for review January 29, 2026 07:41
@bratpiorka bratpiorka force-pushed the rrudnick_cuda_fatbin branch from 68c0735 to b41feee Compare January 29, 2026 08:22
@bratpiorka
Copy link
Copy Markdown
Contributor Author

@intel/llvm-gatekeepers please merge

@KornevNikita KornevNikita merged commit a9c5fc0 into intel:sycl Jan 29, 2026
61 of 64 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants