perf: vectorized output for first_value/last_value/nth_value over a window frame#23205
Draft
Dandandan wants to merge 1 commit into
Draft
perf: vectorized output for first_value/last_value/nth_value over a window frame#23205Dandandan wants to merge 1 commit into
Dandandan wants to merge 1 commit into
Conversation
…rame On the non-streaming `WindowAggExec` path, frame-using window functions go through `StandardWindowExpr::evaluate`, which calls `PartitionEvaluator::evaluate` once per row, boxing a `ScalarValue` per row and reassembling with `ScalarValue::iter_to_array`. For first_value/last_value/nth_value the per-row result is just a positional gather from the input column, so this boxing is pure overhead. This adds two opt-in `PartitionEvaluator` methods — `supports_evaluate_all_with_frame_ranges` (default `false`) and `evaluate_all_with_frame_ranges` — and wires `StandardWindowExpr::evaluate` to use them when supported, falling back to the per-row loop otherwise (so behavior is unchanged for every other evaluator). `NthValueEvaluator` implements them for the common `IGNORE NULLS == false` case: it builds a single `UInt32` gather index from the per-row frame ranges (null for empty frames / out-of-range Nth) and produces the whole output column with one `arrow::compute::take`. The `IGNORE NULLS` path, which walks the null bitmap per row, stays on the scalar path. A differential unit test asserts the vectorized result matches the per-row `evaluate` output element-for-element (including NULLs, empty frames, and first/last/nth±) and the `window` sqllogictest suite passes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
On the non-streaming
WindowAggExecpath, frame-using window functions go throughStandardWindowExpr::evaluate, which callsPartitionEvaluator::evaluateonce per row — boxing aScalarValueper row and reassembling withScalarValue::iter_to_array. Forfirst_value/last_value/nth_valuethe per-row result is just a positional gather from the input column, so theScalarValueboxing is pure overhead that a singletakekernel can replace.What changes are included in this PR?
PartitionEvaluatormethods:supports_evaluate_all_with_frame_ranges(defaultfalse) andevaluate_all_with_frame_ranges(defaultnot_impl_err!). Existing evaluators are unaffected.StandardWindowExpr::evaluateuses them when the evaluator opts in (computing the frame ranges up front, then one vectorized call), and otherwise keeps the existing per-row loop verbatim.NthValueEvaluatorimplements them for the commonIGNORE NULLS == falsecase: it builds a singleUInt32gather index from the per-row frame ranges (null for empty frames / out-of-rangeNth) and produces the whole output column with onearrow::compute::take. TheIGNORE NULLSpath (which walks the null bitmap per row) deliberately stays on the scalar path.Scope note: this only affects the non-streaming
WindowAggExecpath (e.g. frames ending inUNBOUNDED FOLLOWING). The streamingBoundedWindowAggExecpath is untouched and could be addressed separately.Are these changes tested?
Yes:
evaluateoutput element-for-element, covering NULLs, empty frames, andfirst/last/nth(2)/nth(-1).windowsqllogictest suite passes (all 6 files).Are there any user-facing changes?
No behavior change. This adds two defaulted methods to the public
PartitionEvaluatortrait (a backwards-compatible API addition) — may warrant theapi changelabel.