optimize crc32c for riscv64 with Zbc carry-less multiplication#3312
Merged
Conversation
Implement hardware-accelerated CRC32C for RISC-V using the Zbc (carry-less multiplication) extension. The implementation uses 128-bit folding with 4-way parallelism and Barrett reduction, following the approach from Hadoop PR #8371. Key changes: - Add rv_clmul/rv_clmulh inline assembly wrappers - Implement 128-bit fold with 4-way parallel processing (64 bytes/iter) - Add Barrett reduction for final 128-bit to 32-bit conversion - Runtime CPU feature detection via /proc/cpuinfo - Compile-time guard: #ifdef __riscv_zbc - CMake option: WITH_RISCV_ZBC (default OFF) Performance: 3-4x speedup over table-based 8-byte unrolled baseline, ~1.1 GB/s throughput on 1MB data. Correctness: Verified against RFC 3720 B.4 and bitwise reference. Signed-off-by: Felix-Gong <gongxiaofei24@iscas.ac.cn>
isSSE42() is only called from within #ifdef __SSE4_2__ blocks, but the function definition was unconditional, causing -Wunused-function errors on non-x86 builds with -Werror. Signed-off-by: Felix-Gong <gongxiaofei24@iscas.ac.cn>
Contributor
|
LGTM |
|
|
||
| // Hardware-accelerated CRC32C using RISC-V Zbc carry-less multiplication. | ||
| // Processes data in 64-byte chunks with 128-bit folding, then Barrett reduces. | ||
| static uint32_t rv_crc32c_clmul(uint32_t crc, const char* buf, size_t len) { |
Contributor
Author
There was a problem hiding this comment.
The Zbc code is protected by RISC-V+Zbc preprocessor guards. It is only built and executed on compatible 64-bit RISC-V hardware, and will not be compiled on non-RISC-V CI environments. Forcing this path elsewhere is impractical and unnecessary.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
rv_crc32c_clmul was missing the standard CRC32C pre/post XOR conversion. ExtendImpl does crc ^ 0xFFFFFFFF at entry and result ^ 0xFFFFFFFF at exit, but rv_crc32c_clmul did neither, causing wrong CRC values on RISC-V with Zbc extension. Signed-off-by: Felix-Gong <gongxiaofei24@iscas.ac.cn>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Key changes
rv_clmul()/rv_clmulh()inline assembly wrappers for clmul/clmulh instructions/proc/cpuinfo#ifdef __riscv_zbcWITH_RISCV_ZBC(default OFF)Test plan
Performance