Skip to content

Enable compatibility of GNU compilers + MKL#7139

Merged
mohanchen merged 3 commits intodeepmodeling:developfrom
Growl1234:mkl
Mar 31, 2026
Merged

Enable compatibility of GNU compilers + MKL#7139
mohanchen merged 3 commits intodeepmodeling:developfrom
Growl1234:mkl

Conversation

@Growl1234
Copy link
Copy Markdown

@Growl1234 Growl1234 commented Mar 25, 2026

Previously FindMKL.cmake only detect the interface with Intel compilers, which would cause linking error in case GCC and Intel MKL are used together. This PR updates FindMKL.cmake to detect the specific MKL libraries designed for GCC and OpenMPI, which are essential for linking MKL correctly when building ABACUS with GCC. With this PR the combination of GCC and Intel MKL should be available.

Further fixes #3198.

@mohanchen mohanchen added the Compile & CICD & Docs & Dependencies Issues related to compiling ABACUS label Mar 26, 2026
@QuantumMisaka QuantumMisaka self-requested a review March 26, 2026 06:55
@QuantumMisaka
Copy link
Copy Markdown
Collaborator

@Growl1234 Thanks for your contribution! Have you done any tests in certain HPC environments for gcc-mkl toolchain implemeneted in this PR and original gcc toolchain or intel toolchain ?

@QuantumMisaka
Copy link
Copy Markdown
Collaborator

QuantumMisaka commented Mar 26, 2026

@Growl1234 In my simple test in examples/03_spin_polarized/AFM after compiling abacus by your gcc-mkl toolchain setting, error occurred and abacus calculation cannot start

                              ABACUS v3.9.0.26

               Atomic-orbital Based Ab-initio Computation at UStc                    

                     Website: http://abacus.ustc.edu.cn/                             
               Documentation: https://abacus.deepmodeling.com/                       
                  Repository: https://github.com/abacusmodeling/abacus-develop       
                              https://github.com/deepmodeling/abacus-develop         
                      Commit: unknown

 Thu Mar 26 05:42:59 2026
 MAKE THE DIR         : OUT.ABACUS/
 RUNNING WITH DEVICE  : CPU / AMD EPYC 7B12 64-Core Processor (x2)
 WARNING: some of potential function is set to zero cause of less than 1e-30.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 Warning: the number of valence electrons in pseudopotential > 8 for Fe: [Ar] 3d6 4s2
 Pseudopotentials with additional electrons can yield (more) accurate outcomes, but may be less efficient.
 If you're confident that your chosen pseudopotential is appropriate, you can safely ignore this warning.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

 UNIFORM GRID DIM     : 36 * 36 * 36
 UNIFORM GRID DIM(BIG): 9 * 9 * 9
 DONE(0.0246442  SEC) : SETUP UNITCELL
 DONE(0.0264331  SEC) : INIT K-POINTS
 ---------------------------------------------------------
 Self-consistent calculations for electrons
 ---------------------------------------------------------
 SPIN    KPOINTS         PROCESSORS  THREADS     NBASE       
 2       126             1           4           54          
 ---------------------------------------------------------
 Use Systematically Improvable Atomic bases
 ---------------------------------------------------------
 ELEMENT ORBITALS        NBASE       NATOM       XC          
 Fe      4s2p2d1f-9au    27          2           
 ---------------------------------------------------------
 Initial plane wave basis and FFT box
 ---------------------------------------------------------
 DONE(0.0458574  SEC) : INIT PLANEWAVE
 START CHARGE      : atomic
[amd-cpu:2879845] *** Process received signal ***
[amd-cpu:2879845] Signal: Segmentation fault (11)
[amd-cpu:2879845] Signal code: Address not mapped (1)
[amd-cpu:2879845] Failing at address: 0x6d3bee60
[amd-cpu:2879845] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7fdd02c42520]
[amd-cpu:2879845] [ 1] /home/mikoto/toolchain-gcc-mkl-test/abacus-develop-mkl/toolchain/install/openmpi-5.0.8/lib/libmpi.so.40(PMPI_Comm_size+0x38)[0x7fdd06cabb98]
[amd-cpu:2879845] [ 2] /data/softwares/intel/oneapi/mkl/2023.2.0/lib/intel64/libmkl_blacs_intelmpi_lp64.so.2(MKLMPI_Comm_size+0x29)[0x7fdd0857a3d9]
[amd-cpu:2879845] [ 3] /data/softwares/intel/oneapi/mkl/2023.2.0/lib/intel64/libmkl_blacs_intelmpi_lp64.so.2(mkl_blacs_init+0x28)[0x7fdd08578698]
[amd-cpu:2879845] [ 4] /data/softwares/intel/oneapi/mkl/2023.2.0/lib/intel64/libmkl_blacs_intelmpi_lp64.so.2(Csys2blacs_handle+0x48b)[0x7fdd0857816b]
[amd-cpu:2879845] [ 5] abacus(+0x1af4f6)[0x60925b7654f6]
[amd-cpu:2879845] [ 6] abacus(+0x1afb85)[0x60925b765b85]
[amd-cpu:2879845] [ 7] abacus(+0x918704)[0x60925bece704]
[amd-cpu:2879845] [ 8] abacus(+0x79d71f)[0x60925bd5371f]
[amd-cpu:2879845] [ 9] abacus(+0x5a1139)[0x60925bb57139]
[amd-cpu:2879845] [10] abacus(+0x5a0c6a)[0x60925bb56c6a]
[amd-cpu:2879845] [11] abacus(+0x5a0e20)[0x60925bb56e20]
[amd-cpu:2879845] [12] abacus(+0xdce76)[0x60925b692e76]
[amd-cpu:2879845] [13] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7fdd02c29d90]
[amd-cpu:2879845] [14] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7fdd02c29e40]
[amd-cpu:2879845] [15] abacus(+0xdcd05)[0x60925b692d05]
[amd-cpu:2879845] *** End of error message ***
Segmentation fault (core dumped)

My test is done in OneAPI 2023.2 with AMD 7b12 machine.

@Growl1234
Copy link
Copy Markdown
Author

Thank you! This issue is about the linking of MPI interface in MKL in the cmake file, which is only available with MPICH or IntelMPI:

find_library(MKL_BLACS_INTELMPI NAMES mkl_blacs_intelmpi_lp64 HINTS ${MKLROOT}/lib ${MKLROOT}/lib/intel64)

I'll check on it and apply the fix.

@Growl1234 Growl1234 force-pushed the mkl branch 2 times, most recently from 944383b to 1cfdce5 Compare March 26, 2026 11:53
@Growl1234
Copy link
Copy Markdown
Author

It seems to work now on my machine. Do you mind to do a bit further testings?

@QuantumMisaka
Copy link
Copy Markdown
Collaborator

@Growl1234 Thanks for you contribution! I've test your PR in my AMD 7b12 machine with 16core MPI running for Fe5C2(510) surface systems with 120 atoms. Test shows that gcc-mkl toolchain compiled ABACUS 3.9.0.24 have much faster (35s < 42s) SCF process than 3.10.1 compiled by gcc toolchain.

image

Good contribution! May you link the former issue #3198 and summarize how you fixed this problem ? Thanks !

@QuantumMisaka
Copy link
Copy Markdown
Collaborator

Also, could you help me check that the patch in install_cereal.sh is still needed for Intel OneAPI >2025.2 ? I have to remove the line for adding the patch to install ABACUS via gcc-mkl toolchain. If the patch is in no need, you can simply drop it.

@QuantumMisaka
Copy link
Copy Markdown
Collaborator

@Growl1234 You can also remove the gcc-mkl issue in toolchain README in your PR

@QuantumMisaka
Copy link
Copy Markdown
Collaborator

One problem is, if user load intelmpi in their environments, the gcc-mkl toolchain tend to use wrong MPI. I've seen your new commits, are these commits fix this problem ?

@Growl1234
Copy link
Copy Markdown
Author

Growl1234 commented Mar 30, 2026

Test shows that gcc-mkl toolchain compiled ABACUS 3.9.0.24 have much faster (35s < 42s) SCF process than 3.10.1 compiled by gcc toolchain.

Surprise of that!

Also, could you help me check that the patch in install_cereal.sh is still needed for Intel OneAPI >2025.2 ? I have to remove the line for adding the patch to install ABACUS via gcc-mkl toolchain. If the patch is in no need, you can simply drop it.

It seems to be related to Cereal itself. I would suggest using a hash corresponding to a specific commit rather than directory pulling the master branch; if you agree I can submit another PR later. This patch is now no longer fit for the current Cereal and should be removed in my view.

One problem is, if user load intelmpi in their environments, the gcc-mkl toolchain tend to use wrong MPI. I've seen your new commits, are these commits fix this problem ?

My behavior just rebased my fork with new commits in upstream/develop branch.

Regarding toolchain scripts, users can simply add --with-intelmpi=system at the end, which should replace the --with-openmpi=install option:

./install_abacus_toolchain_new.sh --with-intelmpi=system

However, the toolchain has required the use of Intel MPI to be with the Intel compiler.

In fact, Intel MPI and MPICH share the same BLAS library (libmkl_blacs_intelmpi_lp64), so I don't think there should be problems (at least directly regarding to the CMake file). Therefore, if you want to fix it, simply removing or adjusting the behavior behind this error message is enough. (I don't have Intel MPI installed, so I'm not going to work on it.)

Copy link
Copy Markdown
Collaborator

@QuantumMisaka QuantumMisaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Growl1234
Copy link
Copy Markdown
Author

Also, could you help me check that the patch in install_cereal.sh is still needed for Intel OneAPI >2025.2 ? I have to remove the line for adding the patch to install ABACUS via gcc-mkl toolchain. If the patch is in no need, you can simply drop it.

@QuantumMisaka I'll deal with this issue in a new PR #7193.

Copy link
Copy Markdown
Collaborator

@mohanchen mohanchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mohanchen
Copy link
Copy Markdown
Collaborator

Thanks for your contribution! @Growl1234

@mohanchen mohanchen merged commit 577ae5f into deepmodeling:develop Mar 31, 2026
15 checks passed
@Growl1234 Growl1234 deleted the mkl branch March 31, 2026 14:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Compile & CICD & Docs & Dependencies Issues related to compiling ABACUS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: CANNOT install ABACUS via gcc-mkl toolchain

3 participants