feat: support sort_array expression by grorge123 · Pull Request #3706 · apache/datafusion-comet

grorge123 · 2026-03-15T03:50:57Z

Which issue does this PR close?

Closes #3159.

Rationale for this change

Currently, comet does not support sort_array expression, so using sort_array(...) would fall back to Spark. This PR adds sort_array support to achieve native acceleration.

The SortArray expression sorts the elements of an array in either ascending or descending order.

What changes are included in this PR?

Add CometSortArray in arrays.scala to serialize Spark SortArray as DataFusion array_sort.
Register SortArray in QueryPlanSerde.scala.
Preserve Spark sort semantics:
- sort_array(arr) / sort_array(arr, true) -> ascending with NULLS FIRST
- sort_array(arr, false) -> descending with NULLS LAST
Mark floating-point array sorting as Incompatible only when spark.comet.exec.strictFloatingPoint=true.
Explicitly reject unsupported nested complex cases such as array<array<struct<...>>> at planning time so they cleanly fall back to Spark instead of failing at runtime.
Update the supported-expression documentation in spark_expressions_support.md.

How are these changes tested?

Added SQL-file coverage in sort_array.sql for:
- array
- array
- array including NaN, -0.0, and 0.0
- array<decimal(10,0)>
- array
- array<struct<...>>
- array<array>
- array literal case
- empty arrays
- null arrays
- explicit ascending / descending paths
- literal and table-column inputs

Reference: https://github.com/apache/spark/blob/04b821c69e85be5f51a1270b3a9a4155afdb5334/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala#L706-L760

andygrove · 2026-03-16T13:16:49Z

spark/src/main/scala/org/apache/comet/serde/arrays.scala

+        true
+      case ArrayType(elementType, _) =>
+        canRank(elementType, nestedInArray = true)
+      case StructType(fields) if !nestedInArray =>


could you add a comment explaining why there is a restriction around structs in arrays?

Sure, I have added it. Besides, I found nulltype has a similar problem, I have fixed it.

andygrove

LGTM, with one question. Thanks @grorge123!

andygrove · 2026-03-20T14:15:58Z

@grorge123 Could you add a microbenchmark for this expression so that we can see how it performs relative to Spark? This could be a separate PR. See https://github.com/apache/datafusion-comet/tree/main/spark/src/test/scala/org/apache/spark/sql/benchmark for current benchmarks.

grorge123 · 2026-03-22T02:40:20Z

Ok, I will raise another PR to add it.

grorge123 · 2026-03-29T01:05:00Z

Hi @andygrove, just a follow-up on this PR.
Please let me know if there is anything else I should add or revise here. Thanks!

spark/src/main/scala/org/apache/comet/serde/arrays.scala

0lai0 · 2026-03-29T14:18:32Z

Thank @grorge123 ! LGTM

spark/src/test/resources/sql-tests/expressions/array/sort_array.sql

spark/src/main/scala/org/apache/comet/serde/arrays.scala

hsiang-c · 2026-04-13T21:54:50Z

spark/src/test/resources/sql-tests/expressions/array/sort_array.sql

+SELECT sort_array(arr, true) FROM test_sort_array_int
+
+query
+SELECT sort_array(arr, false) FROM test_sort_array_int


👍 This covers both cases mentioned in Spark's comment:

Null elements will be placed at the beginning of the returned array in ascending order or at the end of the returned array in descending order.

spark/src/test/resources/sql-tests/expressions/array/sort_array.sql

feat: support sort_array expression

adb4e57

andygrove reviewed Mar 16, 2026

View reviewed changes

fix: null type in nested array

0c3a13d

grorge123 mentioned this pull request Mar 22, 2026

feat: add sort_array benchmark #3758

Merged

0lai0 reviewed Mar 29, 2026

View reviewed changes

spark/src/main/scala/org/apache/comet/serde/arrays.scala Outdated Show resolved Hide resolved

fix: reduce redundant match

d47f856

grorge123 requested a review from andygrove April 1, 2026 12:24

hsiang-c reviewed Apr 10, 2026

View reviewed changes

spark/src/test/resources/sql-tests/expressions/array/sort_array.sql Outdated Show resolved Hide resolved

hsiang-c reviewed Apr 10, 2026

View reviewed changes

spark/src/test/resources/sql-tests/expressions/array/sort_array.sql Outdated Show resolved Hide resolved

hsiang-c reviewed Apr 10, 2026

View reviewed changes

spark/src/main/scala/org/apache/comet/serde/arrays.scala Show resolved Hide resolved

refactor: reuse sort checker in supportedSortType

2a08799

hsiang-c reviewed Apr 13, 2026

View reviewed changes

spark/src/main/scala/org/apache/comet/serde/arrays.scala Show resolved Hide resolved

hsiang-c reviewed Apr 13, 2026

View reviewed changes

test: add non-boolean case

3721201

parthchandra reviewed Apr 15, 2026

View reviewed changes

spark/src/test/resources/sql-tests/expressions/array/sort_array.sql Show resolved Hide resolved

test: add date, timestamp, and binary case

8fea480

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support sort_array expression#3706

feat: support sort_array expression#3706
grorge123 wants to merge 6 commits intoapache:mainfrom
grorge123:sort_array

grorge123 commented Mar 15, 2026

Uh oh!

andygrove Mar 16, 2026

Uh oh!

grorge123 Mar 16, 2026

Uh oh!

andygrove left a comment

Uh oh!

andygrove commented Mar 20, 2026

Uh oh!

grorge123 commented Mar 22, 2026

Uh oh!

grorge123 commented Mar 29, 2026

Uh oh!

Uh oh!

0lai0 commented Mar 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hsiang-c Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

grorge123 commented Mar 15, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

andygrove Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

grorge123 Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

andygrove left a comment

Choose a reason for hiding this comment

Uh oh!

andygrove commented Mar 20, 2026

Uh oh!

grorge123 commented Mar 22, 2026

Uh oh!

grorge123 commented Mar 29, 2026

Uh oh!

Uh oh!

0lai0 commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hsiang-c Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

0lai0 commented Mar 29, 2026 •

edited

Loading