Skip to content

optimize crc32c for riscv64 with Zbc carry-less multiplication#3312

Merged
zyearn merged 4 commits into
apache:masterfrom
Felix-Gong:riscv-crc32c-zbc
May 31, 2026
Merged

optimize crc32c for riscv64 with Zbc carry-less multiplication#3312
zyearn merged 4 commits into
apache:masterfrom
Felix-Gong:riscv-crc32c-zbc

Conversation

@Felix-Gong
Copy link
Copy Markdown
Contributor

@Felix-Gong Felix-Gong commented May 28, 2026

Summary

  • Add hardware-accelerated CRC32C for RISC-V using the Zbc (carry-less multiplication) extension
  • 128-bit folding with 4-way parallelism and Barrett reduction
  • 3-4x speedup over table-based baseline, ~1.1 GB/s throughput on 1MB data

Key changes

  • rv_clmul() / rv_clmulh() inline assembly wrappers for clmul/clmulh instructions
  • 128-bit fold with 4-way parallel processing (64 bytes/iteration)
  • Barrett reduction for final 128-bit to 32-bit conversion
  • Runtime CPU feature detection via /proc/cpuinfo
  • Compile-time guard: #ifdef __riscv_zbc
  • CMake option: WITH_RISCV_ZBC (default OFF)

Test plan

  • RFC 3720 Section B.4 standard vectors (5 test cases)
  • CRC Extend composition test
  • Mask/Unmask operations
  • Large data correctness: 64B ~ 1MB against bitwise reference (9 test cases)
  • Performance benchmark: multiple sizes from 64B to 1MB

Performance

Size Software (us) Clmul (us) Speedup Clmul MB/s
64B 0.2 0.1 2.2x 610
4KB 11.3 2.9 3.9x 1347
1MB 2885.1 888.6 3.2x 1125

Implement hardware-accelerated CRC32C for RISC-V using the Zbc
(carry-less multiplication) extension. The implementation uses
128-bit folding with 4-way parallelism and Barrett reduction,
following the approach from Hadoop PR #8371.

Key changes:
- Add rv_clmul/rv_clmulh inline assembly wrappers
- Implement 128-bit fold with 4-way parallel processing (64 bytes/iter)
- Add Barrett reduction for final 128-bit to 32-bit conversion
- Runtime CPU feature detection via /proc/cpuinfo
- Compile-time guard: #ifdef __riscv_zbc
- CMake option: WITH_RISCV_ZBC (default OFF)

Performance: 3-4x speedup over table-based 8-byte unrolled baseline,
~1.1 GB/s throughput on 1MB data.
Correctness: Verified against RFC 3720 B.4 and bitwise reference.

Signed-off-by: Felix-Gong <gongxiaofei24@iscas.ac.cn>
isSSE42() is only called from within #ifdef __SSE4_2__ blocks,
but the function definition was unconditional, causing
-Wunused-function errors on non-x86 builds with -Werror.

Signed-off-by: Felix-Gong <gongxiaofei24@iscas.ac.cn>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@wwbmmm
Copy link
Copy Markdown
Contributor

wwbmmm commented May 30, 2026

LGTM

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

Comment thread src/butil/crc32c.cc Outdated
Comment thread src/butil/crc32c.cc
Comment thread src/butil/crc32c.cc

// Hardware-accelerated CRC32C using RISC-V Zbc carry-less multiplication.
// Processes data in 64-byte chunks with 128-bit folding, then Barrett reduces.
static uint32_t rv_crc32c_clmul(uint32_t crc, const char* buf, size_t len) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Zbc code is protected by RISC-V+Zbc preprocessor guards. It is only built and executed on compatible 64-bit RISC-V hardware, and will not be compiled on non-RISC-V CI environments. Forcing this path elsewhere is impractical and unnecessary.

Felix-Gong and others added 2 commits May 31, 2026 14:15
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
rv_crc32c_clmul was missing the standard CRC32C pre/post XOR
conversion. ExtendImpl does crc ^ 0xFFFFFFFF at entry and
result ^ 0xFFFFFFFF at exit, but rv_crc32c_clmul did neither,
causing wrong CRC values on RISC-V with Zbc extension.

Signed-off-by: Felix-Gong <gongxiaofei24@iscas.ac.cn>
@zyearn zyearn merged commit 49a15c2 into apache:master May 31, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants