[CUDA] refactor fatbinary to use --image3 by bratpiorka · Pull Request #21054 · intel/llvm

bratpiorka · 2026-01-14T17:19:04Z

Summary

This change updates clang-linker-wrapper to invoke NVIDIA fatbinary using the --image3=... image specification format (kind + SM + file) instead of the legacy --image=profile=... format, which is no longer supported in CUDA Toolkit version 13 and later.
Since newer CUDA toolkits have removed the legacy --image option, clang-linker-wrapper must use --image3 to maintain compatibility.

Changes

Refactored NVPTX fatbin construction to generate fatbinary arguments in the following form:

--image3=kind=ptx,sm=<N>,file=<ptx>
--image3=kind=elf,sm=<N>,file=<cubin>

Updated the fatbinary helper to consume OffloadingImage objects directly.
Adjusted driver tests to accept both legacy and --image3 formats where applicable.

NOTE: upstream already has this changed implemented in: llvm/llvm-project@79d8a26

bratpiorka · 2026-01-15T09:24:58Z

@intel/dpcpp-clang-driver-reviewers @intel/dpcpp-tools-reviewers fails on BMG are not related with my change

srividya-sundaram · 2026-01-15T21:59:43Z

 // CHK-CMDS-AOT-NV-NEXT: clang{{.*}} -o [[CLANGOUT:.*]] -dumpdir a.out.nvptx64.sm_50.img. --target=nvptx64-nvidia-cuda -march={{.*}}
 // CHK-CMDS-AOT-NV-NEXT: ptxas{{.*}} --output-file [[PTXASOUT:.*]] [[CLANGOUT]]
-// CHK-CMDS-AOT-NV-NEXT: fatbinary{{.*}} --create [[FATBINOUT:.*]] --image=profile={{.*}},file=[[CLANGOUT]] --image=profile={{.*}},file=[[PTXASOUT]]
+// CHK-CMDS-AOT-NV-NEXT: fatbinary{{.*}} --create [[FATBINOUT:.*]] --image3=kind=ptx,sm={{(compute_)?50}},file=[[CLANGOUT]] --image3=kind=elf,sm=50,file=[[PTXASOUT]]


My understanding is, by specifying compute_*, you can compile code for a range of NVidia GPUs and produce PTX files, while specifying sm_* will generate machine code for the exact GPU specified.
Could you please explain this check? sm={{(compute_)?50}}
This will match sm=50 or sm=compute_50
It makes sense to set the kind to PTX only when sm=compute_50 but the regex here would set it for both sm=50 and sm=compute_50.
Please correct if this is not the case.

I simplified the patterns, and now I’m only looking for sm=50, which works well - I checked both CUDA 12.6 and CUDA 13 toolkits

srividya-sundaram

Driver specific changes LGTM

bratpiorka · 2026-01-29T09:59:05Z

@intel/llvm-gatekeepers please merge

bratpiorka temporarily deployed to WindowsCILock January 14, 2026 17:19 — with GitHub Actions Inactive

bratpiorka temporarily deployed to WindowsCILock January 14, 2026 17:46 — with GitHub Actions Inactive

bratpiorka marked this pull request as ready for review January 15, 2026 09:03

bratpiorka requested review from a team as code owners January 15, 2026 09:03

YuriPlyakhin reviewed Jan 15, 2026

View reviewed changes

Comment thread clang/test/Driver/clang-linker-wrapper.cpp Outdated

YuriPlyakhin reviewed Jan 15, 2026

View reviewed changes

Comment thread clang/test/Driver/clang-linker-wrapper.cpp Outdated

YuriPlyakhin reviewed Jan 15, 2026

View reviewed changes

Comment thread clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp

bratpiorka force-pushed the rrudnick_cuda_fatbin branch from 2016b55 to ced9034 Compare January 15, 2026 18:24

bratpiorka temporarily deployed to WindowsCILock January 15, 2026 18:25 — with GitHub Actions Inactive

bratpiorka had a problem deploying to WindowsCILock January 15, 2026 18:52 — with GitHub Actions Failure

bratpiorka temporarily deployed to WindowsCILock January 15, 2026 18:52 — with GitHub Actions Inactive

bratpiorka temporarily deployed to WindowsCILock January 15, 2026 20:21 — with GitHub Actions Inactive

srividya-sundaram reviewed Jan 15, 2026

View reviewed changes

bratpiorka force-pushed the rrudnick_cuda_fatbin branch from ced9034 to 68c0735 Compare January 16, 2026 09:04

bratpiorka temporarily deployed to WindowsCILock January 16, 2026 09:05 — with GitHub Actions Inactive

bratpiorka had a problem deploying to WindowsCILock January 16, 2026 09:34 — with GitHub Actions Failure

bratpiorka temporarily deployed to WindowsCILock January 16, 2026 09:34 — with GitHub Actions Inactive

srividya-sundaram approved these changes Jan 16, 2026

View reviewed changes

hchilama approved these changes Jan 16, 2026

View reviewed changes

bratpiorka marked this pull request as draft January 21, 2026 09:06

YuriPlyakhin approved these changes Jan 28, 2026

View reviewed changes

bratpiorka marked this pull request as ready for review January 29, 2026 07:41

[CUDA] Update fatbinary to use image3 syntax

b41feee

bratpiorka force-pushed the rrudnick_cuda_fatbin branch from 68c0735 to b41feee Compare January 29, 2026 08:22

bratpiorka temporarily deployed to WindowsCILock January 29, 2026 08:23 — with GitHub Actions Inactive

bratpiorka temporarily deployed to WindowsCILock January 29, 2026 08:52 — with GitHub Actions Inactive

KornevNikita merged commit a9c5fc0 into intel:sycl Jan 29, 2026
61 of 64 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA] refactor fatbinary to use --image3#21054

[CUDA] refactor fatbinary to use --image3#21054
KornevNikita merged 1 commit intointel:syclfrom
bratpiorka:rrudnick_cuda_fatbin

bratpiorka commented Jan 14, 2026

Uh oh!

bratpiorka commented Jan 15, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

srividya-sundaram Jan 15, 2026

Uh oh!

bratpiorka Jan 16, 2026

Uh oh!

srividya-sundaram left a comment

Uh oh!

bratpiorka commented Jan 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

bratpiorka commented Jan 14, 2026

Summary

Changes

Uh oh!

bratpiorka commented Jan 15, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

srividya-sundaram Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

bratpiorka Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

srividya-sundaram left a comment

Choose a reason for hiding this comment

Uh oh!

bratpiorka commented Jan 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants