Reverse order scans#3
Open
ch-sc wants to merge 4 commits intomassive-com:developfrom
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Reverse order scans are an optimization for queries like
ORDER BY timestamp DESC LIMIT nwhere the data is ordered bytimestamp ASC. Such read patterns appear constantly in time-series workloads where callers want the most recent rows. With the current implementation users would follow naive approaches: fully scan a Vortex file, buffer all rows and then reverse the output or sort all rows of the file. This is unnecessarily expensive.If files are already written in sorted order, a scan in opposite direction can be answered by iterating chunks from last to first and reversing the rows within each chunk. Avoiding sorting and buffering. This PR implements this by reversing ranges in the scan layer and reversing the Vortex array representation.
Implementation
The work spans two layers: the scan orchestration layer (
vortex-layout) and the array encoding layer (vortex-array).Scan layer (vortex-layout)
ScanBuildergains awith_reversed(bool)builder method. When set:RepeatedScan::executecollects the chunk ranges and iterates them in reverse order (last chunk first). This is the global reversal — chunk order is flipped for free by reversing aVecof ranges.map_fnclosure wraps the user-supplied function to callarray.reverse()on each chunk before passing it downstream. This is the per-chunk reversal — row order within each chunk is flipped.Reversed scans are always ordered (they produce a strict global sequence), so
ordered = trueis implied.Array layer (
vortex-array) —ReversedArrayReversedArray is a new lazy wrapper encoding. It is constructed by ArrayRef::reverse() and immediately runs through the optimizer. The optimizer fires structural reduce rules at construction time, before any data is read:
Reduce rules:
Reversed(Reversed(x))xReversed(Dict(codes, values))Dict(Reversed(codes), values)Reversed(Chunked([c₀, c₁, …, cₙ]))Chunked([reverse(cₙ), …, reverse(c₁),reverse(c₀)])Reversedand re-optimized recursivelyThe Dict rule is the most important one. Reversing a
Dictmeans reversing only the codes, not the values.Execute kernels:
BitBuffer::value_unchecked— O(n), no intermediate allocationfield.reverse()on each child — per-field optimizer rules still firetake(reversed_indices)API Changes
New surface in
vortex-array:ArrayRef::reverse() -> VortexResult<ArrayRef>— reverse any array lazilyReversed/ReversedArray— the new encoding type (public, can be pattern-matched)ReverseReducetrait +ReverseReduceAdaptorstruct — extension point for custom encodingsNew surface in
vortex-layout:ScanBuilder::with_reversed(bool) -> SelfScanBuilder::reversed() -> boolNo breaking changes. All changes are additive.
Testing
vortex-array/src/arrays/reversed/tests.rscovers 13 cases forPrimitiveArray,BoolArray,DictArray,StructArray, andChunkedArray.