perf: vectorized RANGE frame boundary search for a single primitive column#23204
Draft
Dandandan wants to merge 1 commit into
Draft
perf: vectorized RANGE frame boundary search for a single primitive column#23204Dandandan wants to merge 1 commit into
Dandandan wants to merge 1 commit into
Conversation
…olumn RANGE window frame boundaries are computed once per row by `WindowFrameStateRange::calculate_index_of_row`, which calls `search_in_slice`. That scan invokes `get_row_at_idx` at every probed index, allocating a `Vec<ScalarValue>` and running a dynamic `ScalarValue` comparison (`compare_rows` -> `try_cmp`) per probe. The scan is amortized O(n) per partition, so this per-probe allocation + enum dispatch dominates RANGE frame evaluation. This adds a fast path for the common case of a single primitive integer/float ORDER BY column: the column is downcast once to `PrimitiveArray<T>` and scanned over native values, reproducing exactly the predicate of the generic path (`compare_rows` for one column, including every NULLS FIRST/LAST x ASC/DESC combination and the float total ordering that `ScalarValue::try_cmp` uses via `ArrowNativeTypeOp::compare`). The boundary-target arithmetic is left entirely unchanged, so decimal/temporal/overflow/underflow semantics are identical; only the comparison scan is specialized. The generic `ScalarValue` path remains the fallback for multi-column frames and non-primitive types (decimal/temporal, whose scale/units would not match a raw native comparison). A differential test asserts the native scan returns the same boundary index as the generic scan for every position, target, and sort-option combination, including nulls, NaN, duplicates, and signed zero. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
RANGE window-frame boundaries are computed once per row by
WindowFrameStateRange::calculate_index_of_row, which callssearch_in_slice. That scan callsget_row_at_idxat every probed index — allocating aVec<ScalarValue>and running a dynamicScalarValuecomparison (compare_rows→try_cmp) per probe. Since the scan amortizes to O(n) per partition, this per-probe heap allocation + enum dispatch dominates RANGE frame evaluation. (ROWS frames already use allocation-free integer arithmetic; this closes part of that gap.)What changes are included in this PR?
A fast path in
calculate_index_of_rowfor the common case of a single primitive integer/float ORDER BY column:PrimitiveArray<T>and scanned over native values, reproducing the generic predicate exactly —compare_rowsfor one column, including every NULLS FIRST/LAST × ASC/DESC combination and the float total ordering thatScalarValue::try_cmpuses (viaArrowNativeTypeOp::compare, which istotal_cmpfor floats).add_checked/sub_checked, overflow-to-edge, unsigned-underflow) is left entirely unchanged, so decimal/temporal/overflow/underflow semantics are identical — only the comparison scan is specialized.ScalarValuepath remains the fallback for multi-column frames and non-primitive types (decimal/temporal, whose scale/units would not match a raw native comparison) and any column/target type mismatch.GROUPS frames are left for a follow-up (they use a different group-boundary mechanism).
Are these changes tested?
Yes:
search_in_slicepath for every position, target, and sort-option combination, including nulls, NaN, duplicates, and signed zero.windowsqllogictest suite passes (all 6 files).Are there any user-facing changes?
No — results are identical; this is a performance-only change.