TurboQuant: Block Decomposition#8139
Draft
connortsui20 wants to merge 3 commits into
Draft
Conversation
48cb4e7 to
666d582
Compare
Merging this PR will degrade performance by 14.6%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | WallTime | cuda/bitpacked_u8/unpack/3bw[100M] |
299.6 µs | 350.8 µs | -14.6% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing ct/tq-block (962c7fc) with develop (73454db)
512ae34 to
7752b04
Compare
…ation Add the block-decomposition surface: an optional power-of-two `block_sizes` on TurboQuantConfig/Metadata (proto tag 6, validated for non-empty/power-of-two/>=MIN_BLOCK_SIZE), per-block centroid tables, and a per-block `derive_block_seed`. MIN_DIMENSION lowered to 64 (= MIN_BLOCK_SIZE). The pipeline rewrite follows in the next commit (crate not yet buildable here). Signed-off-by: Connor Tsui <connor@spiraldb.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Replace the single-block pipeline with a per-block one: each block gets its own L2 norm, SORF
rotation, and centroid table, stored as an outer struct of per-block {norms, codes}. Overspilling
blocks are zero-padded; encode rejects non-finite norms and overflowing dimensions, and decode
returns clean errors (not panics) for out-of-range codes and invalid stored norms.
Signed-off-by: Connor Tsui <connor@spiraldb.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Migrate the suite to the block layout and add round-trip, fidelity (per-block and whole-vector MSE), seed-distinctness, multi-block null/zero-norm, malformed-metadata/code/norm, and NaN/Inf coverage. Refresh the crate docs and add a multi-block file round-trip. Signed-off-by: Connor Tsui <connor@spiraldb.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Tracking issue: #7830
Closes: #7245
Adds block decomposition to Vortex.
Note to reviewers: This PR essentially rewrites a huge amount of the vortex-turboquant logic, so I think it might be a good idea to just review the crate holistically instead of just looking at the diffs.
Since TurboQuant requires input vectors to be padded to power-of-2 dimensions, vector dimensions that are far away from those powers of 2 can lose a large amount of compression space to padding.
By decomposing vectors into blocks of powers of 2, we can reduce the amount of padding necessary. As a simple example, we might want to compress 768-dimension vectors into 2 blocks of 512 and 256. Or if we had 1408-dimension vectors, we would want to compress that into 3 blocks of 1024, 256, and 128. If we have a 1500-dimension vectors, then we could do 2 blocks of 1024 and 512, where there will be an extra 36 dimensions of padding in the last block.
Each block is then TurboQuant-quantized independently, which means it gets its own L2 norm and codes child array (since TQ requires everything to be normalized). This has the added benefit of distributing "energy" better - higher peaks across different dimensions will be normalized within their own block so we do not have an "energy imbalance" in any vector blocks.
Storage array
The encoded array is now an outer
Structwhose fields areblock_0,block_1, ...,block_{N-1}, one per block inmetadata.blocks. Each inner block is the same as the oldStruct { norms, codes }shape.I tried to model this as a scalar function or even something strange internally but I wasn't able to figure out something that was cleaner than this.
Just as before, row validity lives on the outer struct and is authoritative; each inner block's struct validity, and each
norms/codeschild's validity, must "cover" the layer above it, meaning every row the parent marks valid must also be marked valid by the child. (A child is allowed to be valid in more rows than its parent, so for example aNonNullableinner block sitting under aNullableouter struct is legal.)API Changes
There was no backwards compatibility with existing TurboQuant arrays, so we're breaking it! A bunch of functions and types now require passing in a
Vec<u32>(sometimes optional to represent default to next-power-of-2) representing the block sizes.Testing
A bunch of new unit and rstests testing correctness but also MSE distortion.