Skip to content

{2025.06}[foss/2025a,foss/2025b] Add OSU v7.5 / v7.5.1 with CUDA#1462

Open
casparvl wants to merge 2 commits intoEESSI:mainfrom
casparvl:add_ucx_ucc_nccl_2025b
Open

{2025.06}[foss/2025a,foss/2025b] Add OSU v7.5 / v7.5.1 with CUDA#1462
casparvl wants to merge 2 commits intoEESSI:mainfrom
casparvl:add_ucx_ucc_nccl_2025b

Conversation

@casparvl
Copy link
Copy Markdown
Collaborator

@casparvl casparvl commented Apr 7, 2026

No description provided.

@casparvl casparvl changed the title Add UCX-CUDA, UCC-CUDA and NCCL for foss 2025b toolchain {2025.06}[foss/2025b] Add OSU v7.5.1 w CUDA Apr 7, 2026
@casparvl casparvl changed the title {2025.06}[foss/2025b] Add OSU v7.5.1 w CUDA {2025.06}[foss/2025b] Add OSU v7.5.1 with CUDA Apr 7, 2026
@casparvl
Copy link
Copy Markdown
Collaborator Author

casparvl commented Apr 7, 2026

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/intel/icelake,accel=nvidia/cc80

@eessi-bot-surf
Copy link
Copy Markdown

eessi-bot-surf bot commented Apr 7, 2026

New job on instance eessi-bot-surf for repository eessi.io-2025.06-software
Building on: intel-icelake and accelerator nvidia/cc80
Building for: x86_64/intel/icelake and accelerator nvidia/cc80
Job dir: /projects/eessibot/eessi-bot-surf/jobs/2026.04/pr_1462/21612445

date job status comment
Apr 07 10:50:37 UTC 2026 submitted job id 21612445 will be eligible to start in about 20 seconds
Apr 07 10:50:51 UTC 2026 received job awaits launch by Slurm scheduler
Apr 07 10:51:04 UTC 2026 running job 21612445 is running
Apr 07 11:03:27 UTC 2026 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-21612445.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-intel-icelake-accel-nvidia-cc80-17755597400.tar.zstsize: 54 MiB (57571293 bytes)
entries: 262
modules under 2025.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80/modules/all
NCCL/2.27.7-GCCcore-14.3.0-CUDA-12.9.1.lua
OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1.lua
UCC-CUDA/1.4.4-GCCcore-14.3.0-CUDA-12.9.1.lua
UCX-CUDA/1.19.0-GCCcore-14.3.0-CUDA-12.9.1.lua
software under 2025.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80/software
NCCL/2.27.7-GCCcore-14.3.0-CUDA-12.9.1
OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1
UCC-CUDA/1.4.4-GCCcore-14.3.0-CUDA-12.9.1
UCX-CUDA/1.19.0-GCCcore-14.3.0-CUDA-12.9.1
reprod directories under 2025.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80/reprod
NCCL/2.27.7-GCCcore-14.3.0-CUDA-12.9.1/20260407_105826UTC
OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1/20260407_110212UTC
UCC-CUDA/1.4.4-GCCcore-14.3.0-CUDA-12.9.1/20260407_105955UTC
UCX-CUDA/1.19.0-GCCcore-14.3.0-CUDA-12.9.1/20260407_105521UTC
other under 2025.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80
no other files in tarball
Apr 07 11:03:27 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] (1/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node %device_type=gpu /526cd259 @BotBuildTests:gpu_a100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] (2/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node %device_type=gpu /416eaee1 @BotBuildTests:gpu_a100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] (3/4) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node /73a202f1 @BotBuildTests:gpu_a100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] (4/4) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node /7f04eb2b @BotBuildTests:gpu_a100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ PASSED ] Ran 0/4 test case(s) from 4 check(s) (0 failure(s), 4 skipped, 0 aborted)
Details
✅ job output file slurm-21612445.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@casparvl
Copy link
Copy Markdown
Collaborator Author

casparvl commented Apr 7, 2026

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/intel/icelake,accel=nvidia/cc80

@eessi-bot-surf
Copy link
Copy Markdown

eessi-bot-surf bot commented Apr 7, 2026

New job on instance eessi-bot-surf for repository eessi.io-2025.06-software
Building on: intel-icelake and accelerator nvidia/cc80
Building for: x86_64/intel/icelake and accelerator nvidia/cc80
Job dir: /projects/eessibot/eessi-bot-surf/jobs/2026.04/pr_1462/21612699

date job status comment
Apr 07 10:59:34 UTC 2026 submitted job id 21612699 will be eligible to start in about 20 seconds
Apr 07 10:59:43 UTC 2026 received job awaits launch by Slurm scheduler
Apr 07 11:01:12 UTC 2026 running job 21612699 is running
Apr 07 11:13:51 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-21612699.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-intel-icelake-accel-nvidia-cc80-17755603700.tar.zstsize: 54 MiB (57570182 bytes)
entries: 262
modules under 2025.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80/modules/all
NCCL/2.27.7-GCCcore-14.3.0-CUDA-12.9.1.lua
OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1.lua
UCC-CUDA/1.4.4-GCCcore-14.3.0-CUDA-12.9.1.lua
UCX-CUDA/1.19.0-GCCcore-14.3.0-CUDA-12.9.1.lua
software under 2025.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80/software
NCCL/2.27.7-GCCcore-14.3.0-CUDA-12.9.1
OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1
UCC-CUDA/1.4.4-GCCcore-14.3.0-CUDA-12.9.1
UCX-CUDA/1.19.0-GCCcore-14.3.0-CUDA-12.9.1
reprod directories under 2025.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80/reprod
NCCL/2.27.7-GCCcore-14.3.0-CUDA-12.9.1/20260407_110910UTC
OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1/20260407_111242UTC
UCC-CUDA/1.4.4-GCCcore-14.3.0-CUDA-12.9.1/20260407_111033UTC
UCX-CUDA/1.19.0-GCCcore-14.3.0-CUDA-12.9.1/20260407_110626UTC
other under 2025.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80
no other files in tarball
Apr 07 11:13:51 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] (1/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node %device_type=gpu /526cd259 @BotBuildTests:gpu_a100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] (2/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node %device_type=gpu /416eaee1 @BotBuildTests:gpu_a100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] (3/4) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node /73a202f1 @BotBuildTests:gpu_a100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] (4/4) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node /7f04eb2b @BotBuildTests:gpu_a100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ PASSED ] Ran 0/4 test case(s) from 4 check(s) (0 failure(s), 4 skipped, 0 aborted)
Details
✅ job output file slurm-21612699.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@casparvl casparvl changed the title {2025.06}[foss/2025b] Add OSU v7.5.1 with CUDA {2025.06}[foss/2025a,foss/2025b] Add OSU v7.5 / v7.5.1 with CUDA Apr 7, 2026
@casparvl
Copy link
Copy Markdown
Collaborator Author

casparvl commented Apr 7, 2026

Hmm, I'll need to split off GDR-copy and install it first (in a separate PR):

== FAILED: Installation ended unsuccessfully: It seems you are trying to install a CPU-only package GDRCopy into accelerator location /cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80/software/GDRCopy/2.4.4-GCCcore-14.2.0. If this is a depe
ndency of the package you are really interested in you will need to first install the CPU-only dependencies of that package. (took 0 secs)

@casparvl
Copy link
Copy Markdown
Collaborator Author

casparvl commented Apr 7, 2026

Building GDRcopy in #1463

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant