[CUDA] refactor fatbinary to use --image3#21054
Conversation
|
@intel/dpcpp-clang-driver-reviewers @intel/dpcpp-tools-reviewers fails on BMG are not related with my change |
2016b55 to
ced9034
Compare
| // CHK-CMDS-AOT-NV-NEXT: clang{{.*}} -o [[CLANGOUT:.*]] -dumpdir a.out.nvptx64.sm_50.img. --target=nvptx64-nvidia-cuda -march={{.*}} | ||
| // CHK-CMDS-AOT-NV-NEXT: ptxas{{.*}} --output-file [[PTXASOUT:.*]] [[CLANGOUT]] | ||
| // CHK-CMDS-AOT-NV-NEXT: fatbinary{{.*}} --create [[FATBINOUT:.*]] --image=profile={{.*}},file=[[CLANGOUT]] --image=profile={{.*}},file=[[PTXASOUT]] | ||
| // CHK-CMDS-AOT-NV-NEXT: fatbinary{{.*}} --create [[FATBINOUT:.*]] --image3=kind=ptx,sm={{(compute_)?50}},file=[[CLANGOUT]] --image3=kind=elf,sm=50,file=[[PTXASOUT]] |
There was a problem hiding this comment.
My understanding is, by specifying compute_*, you can compile code for a range of NVidia GPUs and produce PTX files, while specifying sm_* will generate machine code for the exact GPU specified.
Could you please explain this check? sm={{(compute_)?50}}
This will match sm=50 or sm=compute_50
It makes sense to set the kind to PTX only when sm=compute_50 but the regex here would set it for both sm=50 and sm=compute_50.
Please correct if this is not the case.
There was a problem hiding this comment.
I simplified the patterns, and now I’m only looking for sm=50, which works well - I checked both CUDA 12.6 and CUDA 13 toolkits
ced9034 to
68c0735
Compare
srividya-sundaram
left a comment
There was a problem hiding this comment.
Driver specific changes LGTM
68c0735 to
b41feee
Compare
|
@intel/llvm-gatekeepers please merge |
Summary
This change updates clang-linker-wrapper to invoke NVIDIA fatbinary using the --image3=... image specification format (kind + SM + file) instead of the legacy --image=profile=... format, which is no longer supported in CUDA Toolkit version 13 and later.
Since newer CUDA toolkits have removed the legacy --image option, clang-linker-wrapper must use --image3 to maintain compatibility.
Changes
NOTE: upstream already has this changed implemented in: llvm/llvm-project@79d8a26