feat: add inner_product scalar function by crm26 · Pull Request #21861 · apache/datafusion

crm26 · 2026-04-26T22:32:51Z

Which issue does this PR close?

Part of #21536 — split of #21371 into one-function-per-PR.

Rationale for this change

Adds inner_product(array1, array2) — the dot product of two equal-length numeric arrays, returning Float64. Computed as sum(array1[i] * array2[i]).

What changes are included in this PR?

Mirrors the structural pattern of merged #21542 (cosine_distance):

Same coerce_types for List/LargeList/FixedSizeList of any numeric inner type, with widening to LargeList when any input is LargeList (per the fix: array_concat widens container variant for mixed List/LargeList inputs #21704 pattern)
Same NULL semantics: bare NULL → NULL, NULL row → NULL, NULL element in list → NULL
Same Arrow-idiomatic implementation: single as_float64_array(list_array.values()) downcast, slice by value_offsets(), iterate via ScalarBuffer<f64>
No alias, no shared module — standalone, inline math

The arithmetic is the only semantic divergence from cosine_distance:

dot += a*b (no magnitude or normalization)
Empty arrays return 0.0 (sum of empty set), not NULL
No zero-magnitude special case (inner_product([0,0], [1,2]) returns 0, which is well-defined for inner product)

Are these changes tested?

Yes. SLT covers:

Orthogonal, identical, opposite, general non-trivial vectors
Single zero vector, both zero vectors
Bare NULL in either or both positions
NULL element inside a list (returns NULL for that row)
Mismatched lengths (error)
LargeList inputs
Mixed (List, LargeList) in both orders
(FixedSizeList, FixedSizeList) and (FixedSizeList, LargeList)
Float32 and Int64 inner type coercion
Multi-row query with NULL row propagation
Empty arrays (returns 0)
No-args error
Return-type assertion (Float64)

Are there any user-facing changes?

New scalar function inner_product, documented in docs/source/user-guide/sql/scalar_functions.md.

Jefffrey · 2026-04-27T03:27:41Z

+)]
+#[derive(Debug, PartialEq, Eq, Hash)]
+pub struct InnerProduct {
+    signature: Signature,


Should we add a dot_product alias?

Thanks @Jefffrey — added dot_product as an alias in ef9895005, with SLT coverage for both a constant-args and a multi-row-with-NULL case. Doc regen picked up the alias automatically (#### Aliases block under inner_product, plus a top-level ### dot_product Alias of stub).

alamb · 2026-04-27T20:17:35Z

I had a thought about adding new functions:

Add vector distance, array math, and array aggregate functions #21536 (comment)

Jefffrey · 2026-05-02T01:51:47Z

I think once merge conflict is fixed we should be good to merge this

alamb · 2026-05-03T12:00:04Z

I merged up to resolve a conflict

alamb · 2026-05-03T12:00:16Z

Thanks @crm26 and @Jefffrey

## Which issue does this PR close? Part of apache#21536 — split of apache#21371 into one-function-per-PR. Third in the series after apache#21542 (cosine_distance) and apache#21861 (inner_product). ## Rationale for this change Adds `array_normalize(array)` — the L2-normalized version of a numeric input vector. Computed as `array[i] / sqrt(sum(array[i]^2))` per element. Returns the same shape as the input (`List<Float64>` or `LargeList<Float64>`). Aliased as `list_normalize` to match the `array_X`/`list_X` convention used across the crate. ## What changes are included in this PR? Coercion shell mirrors the merged cosine_distance/inner_product pattern: - `coerce_types` accepts `List`/`LargeList`/`FixedSizeList` of any numeric inner type, plus bare `NULL`. After coercion the inner function only sees `List(Float64)` or `LargeList(Float64)`. - Per-row L2 norm computed inline (no shared module), using a single `as_float64_array(list_array.values())` downcast plus `value_offsets()` slicing — no per-row downcasts. - Manual list builder: `Vec<f64>` for values, `Vec<O>` for offsets, `NullBuffer` for row validity. Per-row semantics: - NULL row → NULL output - NULL element in list → NULL row - Empty list → empty list (no division-by-zero hazard) - Zero magnitude → NULL row (consistent with cosine_distance's zero-magnitude → NULL) - Otherwise → divide each element by `sqrt(sum-of-squares)` ## Are these changes tested? Yes. SLT covers: - 3-4-5 right triangle, 3D vector, already-unit-axis, single non-zero component, negative components - Bare `NULL` input, NULL element in list, zero vector, empty array - `LargeList`, `FixedSizeList` (via coercion), `Float32` and `Int64` inner types, integer literals - Multi-row query mixing normal / NULL row / zero-vector row / null-element row - Plan error for non-list input - No-args error - Return-type assertion (`List(Float64)`) - `list_normalize` alias coverage (constant + multi-row with NULL) ## Are there any user-facing changes? New scalar function `array_normalize` (alias `list_normalize`), documented in `docs/source/user-guide/sql/scalar_functions.md`.

Adds `array_add(array1, array2)` returning the element-wise sum of two numeric arrays. Aliased as `list_add`. Follows the per-function split pattern established by cosine_distance (apache#21542), inner_product (apache#21861), and array_normalize (apache#22013) per tracking issue apache#21536. Semantics: - NULL row in either input -> NULL row out - NULL element at position i in either input -> NULL element at i out (per-element propagation, divergent from inner_product which nulls the whole row; chosen because output is a list, not a scalar) - Length mismatch between rows -> exec_err - Empty arrays -> empty array Supports List, LargeList, and FixedSizeList inputs; numeric element types are coerced to Float64. If any input is LargeList, both sides are widened to LargeList for homogeneous runtime dispatch. Uses OffsetBufferBuilder + NullBufferBuilder per the pattern adopted in array_normalize round 1.

@alamb

## Which issue does this PR close? Partial of apache#21536 — `array_scale` (the list+scalar arithmetic function in the vector math series). ## Rationale for this change Continues the per-function split requested by @alamb on apache#21536. Three sibling PRs already merged: `cosine_distance` (apache#21542), `inner_product` (apache#21861), `array_normalize` (apache#22013). `array_add` is in flight as apache#22459 by @SubhamSinghal. Adds element-wise scalar multiplication for numeric arrays, returning a list of the same shape. Aliased as `list_scale` to match the `array_X` / `list_X` precedent in this crate. ## What changes are included in this PR? - New scalar UDF `array_scale(array, scalar)` in `datafusion/functions-nested/src/array_scale.rs` - Module wire-up + registration in `datafusion/functions-nested/src/lib.rs` - SLT tests at `datafusion/sqllogictest/test_files/array_scale.slt` - Auto-generated function docs entry in `docs/source/user-guide/sql/scalar_functions.md` **Signature:** first arg `List/LargeList/FixedSizeList<numeric>`, second arg numeric scalar. Both coerce to `Float64`. Same list-widening rules as the binary-op siblings. **NULL semantics:** - NULL row in array → NULL row out - NULL scalar → NULL row out (whole-row, because the scalar applies uniformly) - NULL element at position \`i\` → NULL element at \`i\` out (per-element propagation) - Empty array → empty array **Builders:** uses \`OffsetBufferBuilder\` + \`NullBufferBuilder\` per the pattern adopted in the round-1 review of apache#22013. ## Are these changes tested? Yes. \`array_scale.slt\` covers: - Happy paths (positive, negative, zero, fractional, single-element) - NULL propagation at all three levels (NULL row, NULL scalar, NULL element) - All list type variants (\`List\`, \`LargeList\`, \`FixedSizeList\`) - Numeric inner type coercion (Float32, Int64, integer literals) - Multi-row queries with both constant-scalar broadcast and per-row column scalar - Error paths (non-numeric scalar, non-list first arg, wrong arity) - Empty array - \`list_scale\` alias ## Are there any user-facing changes? Yes — new SQL scalar function \`array_scale(array, scalar)\` and its alias \`list_scale\`. Documented in \`docs/source/user-guide/sql/scalar_functions.md\`.

Adds `array_sum(array)` returning the sum of elements in a numeric array. Aliased as `list_sum`. Part of the per-function split sequence on tracking issue apache#21536, following the pattern of the already-merged PRs in this series (cosine_distance apache#21542, inner_product apache#21861, array_normalize apache#22013, array_scale apache#22466). Semantics: - NULL row in array -> NULL row out - NULL elements are skipped (SQL aggregate convention; matches PostgreSQL array_sum, DuckDB list_sum, Spark aggregate). A row whose every element is NULL yields NULL. - Empty array -> 0.0 (additive identity, matches SQL SUM over no rows conceptually, and DuckDB list_sum([]) = 0) Input is List/LargeList/FixedSizeList of any numeric type; elements are coerced to Float64. Output is Float64.

github-actions Bot added documentation Improvements or additions to documentation sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Apr 26, 2026

Jefffrey reviewed Apr 27, 2026

View reviewed changes

feat: add inner_product scalar function

ef98950

crm26 force-pushed the feat/inner-product branch from 8c05259 to ef98950 Compare April 29, 2026 21:45

Jefffrey approved these changes Apr 30, 2026

View reviewed changes

Merge branch 'main' into feat/inner-product

ab67bce

alamb added 2 commits May 3, 2026 07:57

Merge remote-tracking branch 'apache/main' into feat/inner-product

0042d83

Fixup docs

72a2fc6

alamb enabled auto-merge May 3, 2026 12:00

alamb added this pull request to the merge queue May 3, 2026

Merged via the queue into apache:main with commit 9a29e33 May 3, 2026
36 checks passed

crm26 mentioned this pull request May 5, 2026

feat: add array_normalize scalar function #22013

Merged

This was referenced May 20, 2026

feat: add vector distance and array math functions #21371

Closed

Add vector distance, array math, and array aggregate functions #21536

Open

crm26 mentioned this pull request May 22, 2026

feat: add array_scale scalar function #22466

Merged

crm26 mentioned this pull request May 26, 2026

feat: add array_sum scalar function #22542

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add inner_product scalar function#21861

feat: add inner_product scalar function#21861
alamb merged 4 commits into
apache:mainfrom
crm26:feat/inner-product

crm26 commented Apr 26, 2026

Uh oh!

Jefffrey Apr 27, 2026

Uh oh!

crm26 May 1, 2026

Uh oh!

alamb commented Apr 27, 2026

Uh oh!

Jefffrey commented May 2, 2026

Uh oh!

alamb commented May 3, 2026

Uh oh!

alamb commented May 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

crm26 commented Apr 26, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Jefffrey Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

crm26 May 1, 2026

Choose a reason for hiding this comment

Uh oh!

alamb commented Apr 27, 2026

Uh oh!

Jefffrey commented May 2, 2026

Uh oh!

alamb commented May 3, 2026

Uh oh!

alamb commented May 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants