Skip to content

Reverse order scans#3

Open
ch-sc wants to merge 4 commits intomassive-com:developfrom
ch-sc:reverse-order-scans
Open

Reverse order scans#3
ch-sc wants to merge 4 commits intomassive-com:developfrom
ch-sc:reverse-order-scans

Conversation

@ch-sc
Copy link
Copy Markdown
Collaborator

@ch-sc ch-sc commented May 6, 2026

Summary

Reverse order scans are an optimization for queries like ORDER BY timestamp DESC LIMIT n where the data is ordered by timestamp ASC. Such read patterns appear constantly in time-series workloads where callers want the most recent rows. With the current implementation users would follow naive approaches: fully scan a Vortex file, buffer all rows and then reverse the output or sort all rows of the file. This is unnecessarily expensive.

If files are already written in sorted order, a scan in opposite direction can be answered by iterating chunks from last to first and reversing the rows within each chunk. Avoiding sorting and buffering. This PR implements this by reversing ranges in the scan layer and reversing the Vortex array representation.

Implementation

The work spans two layers: the scan orchestration layer (vortex-layout) and the array encoding layer (vortex-array).

Scan layer (vortex-layout)

ScanBuilder gains a with_reversed(bool) builder method. When set:

  • RepeatedScan::execute collects the chunk ranges and iterates them in reverse order (last chunk first). This is the global reversal — chunk order is flipped for free by reversing a Vec of ranges.
  • The map_fn closure wraps the user-supplied function to call array.reverse() on each chunk before passing it downstream. This is the per-chunk reversal — row order within each chunk is flipped.

Reversed scans are always ordered (they produce a strict global sequence), so ordered = true is implied.

Array layer (vortex-array) — ReversedArray

ReversedArray is a new lazy wrapper encoding. It is constructed by ArrayRef::reverse() and immediately runs through the optimizer. The optimizer fires structural reduce rules at construction time, before any data is read:

Reduce rules:

Pattern Result Cost
Reversed(Reversed(x)) x Zero — both wrappers cancelled
Reversed(Dict(codes, values)) Dict(Reversed(codes), values) Reverse only the codes array; values dictionary reused
Reversed(Chunked([c₀, c₁, …, cₙ])) Chunked([reverse(cₙ), …, reverse(c₁),reverse(c₀)]) Chunk order flipped; each chunk wrapped in Reversed and re-optimized recursively

The Dict rule is the most important one. Reversing a Dict means reversing only the codes, not the values.

Execute kernels:

Canonical type Path
Primitive Iterates the typed buffer backwards — O(n), sequential, auto-vectorizable
Bool Reads bits in reverse via BitBuffer::value_unchecked — O(n), no intermediate allocation
Struct Calls field.reverse() on each child — per-field optimizer rules still fire
All others Falls back to take(reversed_indices)

API Changes

New surface in vortex-array:

  • ArrayRef::reverse() -> VortexResult<ArrayRef> — reverse any array lazily
  • Reversed / ReversedArray — the new encoding type (public, can be pattern-matched)
  • ReverseReduce trait + ReverseReduceAdaptor struct — extension point for custom encodings

New surface in vortex-layout:

  • ScanBuilder::with_reversed(bool) -> Self
  • ScanBuilder::reversed() -> bool

No breaking changes. All changes are additive.

Testing

vortex-array/src/arrays/reversed/tests.rs covers 13 cases for PrimitiveArray, BoolArray, DictArray, StructArray, and ChunkedArray.

ch-sc added 4 commits May 4, 2026 12:20
…on.io>

I, Christoph Schulze <christoph.schulze@polygon.io>, hereby add my Signed-off-by to this commit: 0e64d5e
I, Christoph Schulze <christoph.schulze@polygon.io>, hereby add my Signed-off-by to this commit: 96a951e

Signed-off-by: Christoph Schulze <christoph.schulze@polygon.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant