Skip to content

Conversation

@eschnett
Copy link
Contributor

No description provided.

@eschnett
Copy link
Contributor Author

We could enable libblastrampoline as well.

@eschnett
Copy link
Contributor Author

eschnett commented Oct 19, 2025

I see this warning:

┌ Warning: /home/eschnetter/src/Yggdrasil/P/PETSc/build/aarch64-linux-gnu-libgfortran5-mpi+mpich/5H0hzGRj/aarch64-linux-gnu-libgfortran5-cxx11-mpi+mpich/destdir/lib/petsc/single_complex_Int64/lib/libpetscsingle_complex_Int64.so.3.24.0 contains std::string values!  This causes incompatibilities across the GCC 4/5 version boundary.  To remedy this, you must build a tarball for both GCC 4 and GCC 5.  To do this, immediately after your `platforms` definition in your `build_tarballs.jl` file, add the line:
│
│     platforms = expand_cxxstring_abis(platforms)
└ @ BinaryBuilder.Auditor ~/.julia/packages/BinaryBuilder/3eUfg/src/auditor/compiler_abi.jl:259

Should we do this?

EDIT: It seems that this comes a YAML string parsing library. See #12343.

EDIT: Nope, PETSc uses C++ now, and it calls to_string. expand_cxxstring_abis it is. (See e.g. the directory src/sys/objects/device/interface.) It seems we can disable C++, though, if we want to.

@eschnett
Copy link
Contributor Author

There is a comment

    # Current default build, equivalent to Float64_Real_Int32
    LibraryProduct("libpetsc_double_real_Int64", :libpetsc, "\$libdir/petsc/double_real_Int64/lib")

This comment is clearly inconsistent with the code. I assume it is the comment that is wrong?

@ViralBShah
Copy link
Member

Yes, the comment is clearly wrong.

cc @boriskaus

@ViralBShah
Copy link
Member

It should be ok to turn on LBT support, but we should probably then make sure the forwards are set up in PetSc.jl (through a dependency on OpenBLAS32.jl).

@boriskaus
Copy link
Contributor

yes the comment is wrong

@eschnett eschnett marked this pull request as draft October 20, 2025 00:09
@eschnett
Copy link
Contributor Author

I have added LBT support. Please review.

@eschnett eschnett marked this pull request as ready for review October 21, 2025 12:11
@ViralBShah
Copy link
Member

It appears good to me - but it's a pretty big PR. I believe it should be ok to merge this and quickly test it through PetSc.jl.

@boriskaus Would you be ok with that approach?

@boriskaus
Copy link
Contributor

This is indeed a very large PR and it is difficult to say if it'll work as expected.

In the past I've found it useful to test the package by essentially running the precompiled and packaged PETSc examples, which is done here:

https://github.com/boriskaus/test_PETSc_jll

I've seen many cases were BinaryBuilder builds were ok, but it didn't actually run at the end (at least on the systems provided on GitHub actions)

Is it possible to use the compiled binaries of this PR for such testing? If yes, how do I set this up?

@ViralBShah
Copy link
Member

I do not know if there is an easy way to test these. You can manually download the build artifacts from buildkite.

@DilumAluthge @giordano Is there an existing mechanism to test build artifacts in a package's CI before it is registered here?

@giordano
Copy link
Member

@boriskaus
Copy link
Contributor

@eschnett would it be possible for to compile a local version say in your github account, so I can link the tests to that?

@eschnett
Copy link
Contributor Author

I often have problems with the "push local version to github" setup. It's also quite frightening that this will refuse to update an existing repository so that I have to actually delete the repository for every trial run. I'm building it locally (--deploy=local) which is much simpler, and I can then run tests locally (after a develop PETSc_jll).

@boriskaus
Copy link
Contributor

I experienced similar issues, but I only have to delete the latest release, rather than the repository, for this to work.

The advantage of running it on github actions is that it tests several operating systems at once.

@eschnett
Copy link
Contributor Author

The local PETSc_jll seems to work.

I do not understand how to use test_PETSc_jll; would I need to rebuild it using the new PETSc_jll?

I can use PETSc.jl with the new PETSc_jll.jl and solve a simple linear system.

@boriskaus
Copy link
Contributor

in test_PETSc_jll, we would simply have to update the PETSc_jll version; after this CI will run the tests automatically on different OS.
I'm currently building a version of PETSc_jll 3.24 that has your changes for my local github repo. As soon as that is done, I'll test.
Since I need to leave soonish, that may not get done today, though.

@boriskaus
Copy link
Contributor

I do not understand how to use test_PETSc_jll; would I need to rebuild it using the new PETSc_jll?

you would have to modify this line and point it to the correct PETSc_jll package you want to test (ideally in a local github). If that change is pushed to github, CI will run the tests on windows, mac and linux. You can also point it to your local installation, which implies that only your OS is tested.

I can use PETSc.jl with the new PETSc_jll.jl and solve a simple linear system.
that is already very encouraging!

@eschnett
Copy link
Contributor Author

I see segfaults (even with the current released versions of PETSc, and with SCALAPACK32_jll pinned to 2.2.1) on Julia 1.12. Earlier Julia versions are working. This is x86-64 linux gnu with MPICH.

@boriskaus
Copy link
Contributor

boriskaus commented Oct 24, 2025

See https://github.com/boriskaus/test_PETSc_jll/actions/runs/18791153434 for CI results

There is indeed an issue with suitesparse

@ViralBShah
Copy link
Member

ViralBShah commented Oct 24, 2025

Do you have a dependency added to OpenBLAS32.jl in Petsc.jl? Since this version of Petsc is compiled with LBT, we need forwards for the LP64 symbols (which is what the OpenBLAS32 __init__ does). I would expect it to fail without that.

@boriskaus
Copy link
Contributor

Adding OpenBLAS32.jl as dependency does not resolve the issue, as you can see here

Interestingly, suitesparse works on Mac. On linux, it sometimes seems to do a few iterations before segfaulting.

@boriskaus
Copy link
Contributor

boriskaus commented Oct 29, 2025

I've locally tested this latest build, and unfortunately suitesparse still fails on linux.
Unless I set up something wrong, ofcourse.

edit: on 1.12, suitesparse works on Mac; not on linux. Obviously, linux is the most important platform for this package...

@ViralBShah
Copy link
Member

ViralBShah commented Oct 29, 2025

I wonder if there is an issue with the suitesparse build or something else with LBT. It looks like the crash is here: https://github.com/boriskaus/test_PETSc_jll/actions/runs/18888228913/job/53909074280#step:7:190

[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: The line numbers in the error traceback are not always exact.
or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/
[0]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
[0] #1 MatLUFactorNumeric_UMFPACK() at /workspace/srcdir/petsc-3.24.0/src/mat/impls/aij/seq/umfpack/umfpack.c:173
[0] #2 MatLUFactorNumeric() at /workspace/srcdir/petsc-3.24.0/src/mat/interface/matrix.c:3336
[0] #3 PCSetUp_LU() at /workspace/srcdir/petsc-3.24.0/src/ksp/pc/impls/factor/lu/lu.c:121
[0] #4 PCSetUp() at /workspace/srcdir/petsc-3.24.0/src/ksp/pc/interface/precon.c:1120
[0] #5 KSPSetUp() at /workspace/srcdir/petsc-3.24.0/src/ksp/ksp/interface/itfunc.c:429
[0] #6 KSPSolve_Private() at /workspace/srcdir/petsc-3.24.0/src/ksp/ksp/interface/itfunc.c:839
[0] #7 KSPSolve() at /workspace/srcdir/petsc-3.24.0/src/ksp/ksp/interface/itfunc.c:1090
[0] #8 SNESSolve_NEWTONLS() at /workspace/srcdir/petsc-3.24.0/src/snes/impls/ls/ls.c:220
[0] #9 SNESSolve() at /workspace/srcdir/petsc-3.24.0/src/snes/interface/snes.c:4906
[0] #10 main() at ex19.c:152

Cc @imciner2

@boriskaus
Copy link
Contributor

This is indeed within suitesparse. Interestingly, it often works for a few iterations before it crashes:

  2 SNES Function norm < 1.e-11
Number of SNES iterations = 2
lid velocity = 0.0625, prandtl # = 1., grashof # = 1.
  0 SNES Function norm 0.239155
    0 KSP Residual norm 0.235858
    1 KSP Residual norm < 1.e-11
  1 SNES Function norm 6.81968e-05
    0 KSP Residual norm 2.30906e-05
    1 KSP Residual norm < 1.e-11
  2 SNES Function norm < 1.e-11
Number of SNES iterations = 2
lid velocity = 0.0016, prandtl # = 1., grashof # = 1.
  0 SNES Function norm 0.0406612
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: The line numbers in the error traceback are not always exact.
or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/
[0]PETSC ERROR: ---------------------  Stack Frames ------------------------------------

@ViralBShah
Copy link
Member

The only thing that comes to mind is perhaps the libraries in SuiteSparse_jll and SuiteSparse32_jll have the same names and perhaps the wrong one is getting called somehow.

@boriskaus
Copy link
Contributor

interestingly, the current version of PETSc_jll (version 3.22.0) works fine on all platforms including with SuiteSparse and julia 1.12, as you can see here.
If its an issue with the SuiteSparse_jll libraries, I would have expected this to crash in these cases as well.

As far as I can tell the versions of all packages are the same between the 3.22 and the 3.24 tests.

@eschnett
Copy link
Contributor Author

The versions of SCALAPACK32_jll differ between PETSc_jll 3.22 and 3.24. The old SCALAPACK32_jll links directly against OpenBLAS32_jll, the new version links against libblastrampoline_jll and tells libblastrampoline_jll to forward the calls to OpenBLAS32_jll.

@boriskaus
Copy link
Contributor

boriskaus commented Oct 30, 2025

failing PETSc_jll 3.24:

  [aabda75e] SCALAPACK32_jll v2.2.2+0
  [4536629a] ~ OpenBLAS_jll  v0.3.29+0
  [458c3c95] + OpenSSL_jll v3.5.1+0
  [83775a58] + Zlib_jll v1.3.1+2
  [8e850b90] + libblastrampoline_jll v5.15.0+0

working PETSc_jll 3.22:

  [aabda75e] + SCALAPACK32_jll v2.2.2+0
  [4536629a] ~ OpenBLAS_jll  v0.3.29+0
  [458c3c95] + OpenSSL_jll v3.5.1+0
  [83775a58] + Zlib_jll v1.3.1+2
  [8e850b90] + libblastrampoline_jll v5.15.0+0

This is what the CI seems to be using. There is thus no difference in the version of SCALAPACK32_jll that is being employed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants