perf: Vectorize get_chunk_slice for faster sharded writes#3713
Open
mkitti wants to merge 20 commits intozarr-developers:mainfrom
Open
perf: Vectorize get_chunk_slice for faster sharded writes#3713mkitti wants to merge 20 commits intozarr-developers:mainfrom
mkitti wants to merge 20 commits intozarr-developers:mainfrom
Conversation
Add benchmarks that clear the _morton_order LRU cache before each iteration to measure the full Morton computation cost: - test_sharded_morton_indexing: 512-4096 chunks per shard - test_sharded_morton_indexing_large: 32768 chunks per shard Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add vectorized methods to _ShardIndex and _ShardReader for batch chunk slice lookups, reducing per-chunk function call overhead when writing to shards. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds vectorized methods to
_ShardIndexand_ShardReaderfor batch chunk slice lookups, significantly reducing per-chunk function call overhead when writing to shards.Changes
New Methods
_ShardIndex.get_chunk_slices_vectorized: Batch lookup of chunk slices using NumPy vectorized operations instead of per-chunk Python calls._ShardReader.to_dict_vectorized: Build a chunk dictionary using vectorized lookup instead of iterating with individualget()calls.Modified Code Path
In
_encode_partial_single, replaced:With vectorized approach:
Benchmark Results
Single Chunk Write to Large Shard
Writing a single 1x1x1 chunk to a shard with 32³ chunks (using
test_sharded_morton_write_single_chunkfrom PR #3712):Profile Breakdown
get_chunk_slice+_localize_chunkto_dict_vectorizedloopChecklist
docs/user-guide/*.mdchanges/