Skip to content

perf: vectorized output for first_value/last_value/nth_value over a window frame#23205

Draft
Dandandan wants to merge 1 commit into
apache:mainfrom
Dandandan:perf/window-nth-value-vectorized
Draft

perf: vectorized output for first_value/last_value/nth_value over a window frame#23205
Dandandan wants to merge 1 commit into
apache:mainfrom
Dandandan:perf/window-nth-value-vectorized

Conversation

@Dandandan

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

  • Closes #.

Rationale for this change

On the non-streaming WindowAggExec path, frame-using window functions go through StandardWindowExpr::evaluate, which calls PartitionEvaluator::evaluate once per row — boxing a ScalarValue per row and reassembling with ScalarValue::iter_to_array. For first_value / last_value / nth_value the per-row result is just a positional gather from the input column, so the ScalarValue boxing is pure overhead that a single take kernel can replace.

What changes are included in this PR?

  • Two opt-in PartitionEvaluator methods: supports_evaluate_all_with_frame_ranges (default false) and evaluate_all_with_frame_ranges (default not_impl_err!). Existing evaluators are unaffected.
  • StandardWindowExpr::evaluate uses them when the evaluator opts in (computing the frame ranges up front, then one vectorized call), and otherwise keeps the existing per-row loop verbatim.
  • NthValueEvaluator implements them for the common IGNORE NULLS == false case: it builds a single UInt32 gather index from the per-row frame ranges (null for empty frames / out-of-range Nth) and produces the whole output column with one arrow::compute::take. The IGNORE NULLS path (which walks the null bitmap per row) deliberately stays on the scalar path.

Scope note: this only affects the non-streaming WindowAggExec path (e.g. frames ending in UNBOUNDED FOLLOWING). The streaming BoundedWindowAggExec path is untouched and could be addressed separately.

Are these changes tested?

Yes:

  • A new differential unit test asserts the vectorized result matches the per-row evaluate output element-for-element, covering NULLs, empty frames, and first/last/nth(2)/nth(-1).
  • The full window sqllogictest suite passes (all 6 files).

Are there any user-facing changes?

No behavior change. This adds two defaulted methods to the public PartitionEvaluator trait (a backwards-compatible API addition) — may warrant the api change label.

…rame

On the non-streaming `WindowAggExec` path, frame-using window functions
go through `StandardWindowExpr::evaluate`, which calls
`PartitionEvaluator::evaluate` once per row, boxing a `ScalarValue` per
row and reassembling with `ScalarValue::iter_to_array`. For
first_value/last_value/nth_value the per-row result is just a positional
gather from the input column, so this boxing is pure overhead.

This adds two opt-in `PartitionEvaluator` methods —
`supports_evaluate_all_with_frame_ranges` (default `false`) and
`evaluate_all_with_frame_ranges` — and wires `StandardWindowExpr::evaluate`
to use them when supported, falling back to the per-row loop otherwise
(so behavior is unchanged for every other evaluator).

`NthValueEvaluator` implements them for the common `IGNORE NULLS == false`
case: it builds a single `UInt32` gather index from the per-row frame
ranges (null for empty frames / out-of-range Nth) and produces the whole
output column with one `arrow::compute::take`. The `IGNORE NULLS` path,
which walks the null bitmap per row, stays on the scalar path.

A differential unit test asserts the vectorized result matches the
per-row `evaluate` output element-for-element (including NULLs, empty
frames, and first/last/nth±) and the `window` sqllogictest suite passes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions github-actions Bot added logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates functions Changes to functions implementation labels Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant