Skip to content

perf: vectorized RANGE frame boundary search for a single primitive column#23204

Draft
Dandandan wants to merge 1 commit into
apache:mainfrom
Dandandan:perf/window-range-groups-typed-search
Draft

perf: vectorized RANGE frame boundary search for a single primitive column#23204
Dandandan wants to merge 1 commit into
apache:mainfrom
Dandandan:perf/window-range-groups-typed-search

Conversation

@Dandandan

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

  • Closes #.

Rationale for this change

RANGE window-frame boundaries are computed once per row by WindowFrameStateRange::calculate_index_of_row, which calls search_in_slice. That scan calls get_row_at_idx at every probed index — allocating a Vec<ScalarValue> and running a dynamic ScalarValue comparison (compare_rowstry_cmp) per probe. Since the scan amortizes to O(n) per partition, this per-probe heap allocation + enum dispatch dominates RANGE frame evaluation. (ROWS frames already use allocation-free integer arithmetic; this closes part of that gap.)

What changes are included in this PR?

A fast path in calculate_index_of_row for the common case of a single primitive integer/float ORDER BY column:

  • The column is downcast once to PrimitiveArray<T> and scanned over native values, reproducing the generic predicate exactly — compare_rows for one column, including every NULLS FIRST/LAST × ASC/DESC combination and the float total ordering that ScalarValue::try_cmp uses (via ArrowNativeTypeOp::compare, which is total_cmp for floats).
  • The boundary-target arithmetic (add_checked/sub_checked, overflow-to-edge, unsigned-underflow) is left entirely unchanged, so decimal/temporal/overflow/underflow semantics are identical — only the comparison scan is specialized.
  • The generic ScalarValue path remains the fallback for multi-column frames and non-primitive types (decimal/temporal, whose scale/units would not match a raw native comparison) and any column/target type mismatch.

GROUPS frames are left for a follow-up (they use a different group-boundary mechanism).

Are these changes tested?

Yes:

  • A new differential unit test asserts the native scan returns the same boundary index as the generic search_in_slice path for every position, target, and sort-option combination, including nulls, NaN, duplicates, and signed zero.
  • Full window sqllogictest suite passes (all 6 files).

Are there any user-facing changes?

No — results are identical; this is a performance-only change.

…olumn

RANGE window frame boundaries are computed once per row by
`WindowFrameStateRange::calculate_index_of_row`, which calls
`search_in_slice`. That scan invokes `get_row_at_idx` at every probed
index, allocating a `Vec<ScalarValue>` and running a dynamic
`ScalarValue` comparison (`compare_rows` -> `try_cmp`) per probe. The
scan is amortized O(n) per partition, so this per-probe allocation +
enum dispatch dominates RANGE frame evaluation.

This adds a fast path for the common case of a single primitive
integer/float ORDER BY column: the column is downcast once to
`PrimitiveArray<T>` and scanned over native values, reproducing exactly
the predicate of the generic path (`compare_rows` for one column,
including every NULLS FIRST/LAST x ASC/DESC combination and the float
total ordering that `ScalarValue::try_cmp` uses via
`ArrowNativeTypeOp::compare`). The boundary-target arithmetic is left
entirely unchanged, so decimal/temporal/overflow/underflow semantics are
identical; only the comparison scan is specialized. The generic
`ScalarValue` path remains the fallback for multi-column frames and
non-primitive types (decimal/temporal, whose scale/units would not match
a raw native comparison).

A differential test asserts the native scan returns the same boundary
index as the generic scan for every position, target, and sort-option
combination, including nulls, NaN, duplicates, and signed zero.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions github-actions Bot added the logical-expr Logical plan and expressions label Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

logical-expr Logical plan and expressions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant