provide a NEON version of arm/sgemm by notaz · Pull Request #5800 · OpenMathLib/OpenBLAS

notaz · 2026-05-05T20:12:21Z

Surprisingly OpenBLAS lacks NEON optimized kernels for armv7, even though it auto-enables NEON during build by default (passes -mfpu=neon to the compiler, meaning the compiler will use NEON instructions wherever it can).

The speedup on Cortex-A76 is significant, before:

 M= 200, N= 200, K= 200 :     9262.97 MFlops   0.001727 sec

after:

 M= 200, N= 200, K= 200 :    30223.64 MFlops   0.000529 sec

Non-local labels interfere with profiling. Same thing was done for arm64 in commit a0128aa.

According to ARM AAPCS (Procedure Call Standard) 5.1.2.1, only registers s16-s31 must be preserved across subroutine calls; registers s0-s15 do not need to be preserved.

benchmark/sgemm.goto before: M= 200, N= 200, K= 200 : 9262.97 MFlops 0.001727 sec after: M= 200, N= 200, K= 200 : 30223.64 MFlops 0.000529 sec Conveniently the registers are already allocated suitably for vector operation, so the conversion from vfpv3 was rather straightforward. Prefetching was left out because it doesn't help Cortex-A76, only hurts it slightly.

notaz added 4 commits May 5, 2026 22:36

convert labels to local labels for arm/sgemm

d7aeae8

Non-local labels interfere with profiling. Same thing was done for arm64 in commit a0128aa.

only save the required registers for arm/sgemm

cd276c2

According to ARM AAPCS (Procedure Call Standard) 5.1.2.1, only registers s16-s31 must be preserved across subroutine calls; registers s0-s15 do not need to be preserved.

rename arm32 sgemm_kernel to indicate neon support

fc9d7c7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

provide a NEON version of arm/sgemm#5800

provide a NEON version of arm/sgemm#5800
notaz wants to merge 4 commits intoOpenMathLib:developfrom
notaz:armv7_sgemm

notaz commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

notaz commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant