Skip to content

TurboQuant: Block Decomposition#8139

Draft
connortsui20 wants to merge 3 commits into
developfrom
ct/tq-block
Draft

TurboQuant: Block Decomposition#8139
connortsui20 wants to merge 3 commits into
developfrom
ct/tq-block

Conversation

@connortsui20
Copy link
Copy Markdown
Contributor

@connortsui20 connortsui20 commented May 28, 2026

Summary

Tracking issue: #7830

Closes: #7245

Adds block decomposition to Vortex.

Note to reviewers: This PR essentially rewrites a huge amount of the vortex-turboquant logic, so I think it might be a good idea to just review the crate holistically instead of just looking at the diffs.

Since TurboQuant requires input vectors to be padded to power-of-2 dimensions, vector dimensions that are far away from those powers of 2 can lose a large amount of compression space to padding.

By decomposing vectors into blocks of powers of 2, we can reduce the amount of padding necessary. As a simple example, we might want to compress 768-dimension vectors into 2 blocks of 512 and 256. Or if we had 1408-dimension vectors, we would want to compress that into 3 blocks of 1024, 256, and 128. If we have a 1500-dimension vectors, then we could do 2 blocks of 1024 and 512, where there will be an extra 36 dimensions of padding in the last block.

Each block is then TurboQuant-quantized independently, which means it gets its own L2 norm and codes child array (since TQ requires everything to be normalized). This has the added benefit of distributing "energy" better - higher peaks across different dimensions will be normalized within their own block so we do not have an "energy imbalance" in any vector blocks.

Storage array

The encoded array is now an outer Struct whose fields are block_0, block_1, ..., block_{N-1}, one per block in metadata.blocks. Each inner block is the same as the old Struct { norms, codes } shape.

I tried to model this as a scalar function or even something strange internally but I wasn't able to figure out something that was cleaner than this.

Just as before, row validity lives on the outer struct and is authoritative; each inner block's struct validity, and each norms/codes child's validity, must "cover" the layer above it, meaning every row the parent marks valid must also be marked valid by the child. (A child is allowed to be valid in more rows than its parent, so for example a NonNullable inner block sitting under a Nullable outer struct is legal.)

API Changes

There was no backwards compatibility with existing TurboQuant arrays, so we're breaking it! A bunch of functions and types now require passing in a Vec<u32> (sometimes optional to represent default to next-power-of-2) representing the block sizes.

Testing

A bunch of new unit and rstests testing correctness but also MSE distortion.

@connortsui20 connortsui20 added the changelog/feature A new feature label May 28, 2026
@connortsui20 connortsui20 mentioned this pull request May 28, 2026
8 tasks
@connortsui20 connortsui20 force-pushed the ct/tq-block branch 2 times, most recently from 48cb4e7 to 666d582 Compare May 29, 2026 13:14
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 29, 2026

Merging this PR will degrade performance by 14.6%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

❌ 1 regressed benchmark
✅ 1274 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
WallTime cuda/bitpacked_u8/unpack/3bw[100M] 299.6 µs 350.8 µs -14.6%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing ct/tq-block (962c7fc) with develop (73454db)

Open in CodSpeed

@connortsui20 connortsui20 force-pushed the ct/tq-block branch 4 times, most recently from 512ae34 to 7752b04 Compare May 29, 2026 17:02
…ation

Add the block-decomposition surface: an optional power-of-two `block_sizes` on
TurboQuantConfig/Metadata (proto tag 6, validated for non-empty/power-of-two/>=MIN_BLOCK_SIZE),
per-block centroid tables, and a per-block `derive_block_seed`. MIN_DIMENSION lowered to 64
(= MIN_BLOCK_SIZE). The pipeline rewrite follows in the next commit (crate not yet buildable here).

Signed-off-by: Connor Tsui <connor@spiraldb.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Replace the single-block pipeline with a per-block one: each block gets its own L2 norm, SORF
rotation, and centroid table, stored as an outer struct of per-block {norms, codes}. Overspilling
blocks are zero-padded; encode rejects non-finite norms and overflowing dimensions, and decode
returns clean errors (not panics) for out-of-range codes and invalid stored norms.

Signed-off-by: Connor Tsui <connor@spiraldb.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Migrate the suite to the block layout and add round-trip, fidelity (per-block and whole-vector
MSE), seed-distinctness, multi-block null/zero-norm, malformed-metadata/code/norm, and NaN/Inf
coverage. Refresh the crate docs and add a multi-block file round-trip.

Signed-off-by: Connor Tsui <connor@spiraldb.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TurboQuant rotation bias for non-power-of-2 dimensions

1 participant