feat(cuda): accept a pre-linked CUBIN in load_module_bytes by haixuanTao · Pull Request #7 · dimforge/khal

haixuanTao · 2026-06-08T12:42:23Z

Summary

load_module_bytes assumed the input was PTX text. This makes it also accept a pre-linked CUBIN, detected by the ELF magic (\x7fELF) and loaded via cudarc::nvrtc::Ptx::from_binary. PTX text continues to load exactly as before.

Why

A cubin is required when a module references symbols the CUDA driver JIT cannot resolve on its own — most commonly libdevice __nv_* math (expf, fmaf, …). In that case a toolchain links libdevice into a self-contained cubin ahead of time, and the backend must load that binary rather than re-JIT the PTX. cudarc's load_module already dispatches on the kind, so the change is just selecting from_binary vs from_src.

Small, backwards-compatible (text PTX path is untouched).

🤖 Generated with Claude Code

`load_module_bytes` assumed PTX text. Also accept a CUBIN, detected by the ELF magic (`\x7fELF`), loaded via `Ptx::from_binary`. This is required for modules that reference symbols the driver JIT cannot resolve on its own — e.g. libdevice `__nv_*` math — which a toolchain links into a self-contained binary ahead of time. PTX text continues to load unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cuda): accept a pre-linked CUBIN in load_module_bytes#7

feat(cuda): accept a pre-linked CUBIN in load_module_bytes#7
haixuanTao wants to merge 1 commit into
dimforge:mainfrom
haixuanTao:upstream/cuda-load-cubin

haixuanTao commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

haixuanTao commented Jun 8, 2026

Summary

Why

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant