Conversation
| true | ||
| case ArrayType(elementType, _) => | ||
| canRank(elementType, nestedInArray = true) | ||
| case StructType(fields) if !nestedInArray => |
There was a problem hiding this comment.
could you add a comment explaining why there is a restriction around structs in arrays?
There was a problem hiding this comment.
Sure, I have added it. Besides, I found nulltype has a similar problem, I have fixed it.
andygrove
left a comment
There was a problem hiding this comment.
LGTM, with one question. Thanks @grorge123!
|
@grorge123 Could you add a microbenchmark for this expression so that we can see how it performs relative to Spark? This could be a separate PR. See https://github.com/apache/datafusion-comet/tree/main/spark/src/test/scala/org/apache/spark/sql/benchmark for current benchmarks. |
|
Ok, I will raise another PR to add it. |
|
Hi @andygrove, just a follow-up on this PR. |
|
Thank @grorge123 ! LGTM |
spark/src/test/resources/sql-tests/expressions/array/sort_array.sql
Outdated
Show resolved
Hide resolved
spark/src/test/resources/sql-tests/expressions/array/sort_array.sql
Outdated
Show resolved
Hide resolved
| SELECT sort_array(arr, true) FROM test_sort_array_int | ||
|
|
||
| query | ||
| SELECT sort_array(arr, false) FROM test_sort_array_int |
There was a problem hiding this comment.
👍 This covers both cases mentioned in Spark's comment:
Null elements will be placed at the beginning of the returned array in ascending order or at the end of the returned array in descending order.
Which issue does this PR close?
Closes #3159.
Rationale for this change
Currently, comet does not support
sort_arrayexpression, so usingsort_array(...)would fall back to Spark. This PR adds sort_array support to achieve native acceleration.The SortArray expression sorts the elements of an array in either ascending or descending order.
What changes are included in this PR?
How are these changes tested?
- array
- array
- array including NaN, -0.0, and 0.0
- array<decimal(10,0)>
- array
- array<struct<...>>
- array<array>
- array literal case
- empty arrays
- null arrays
- explicit ascending / descending paths
- literal and table-column inputs
Reference: https://github.com/apache/spark/blob/04b821c69e85be5f51a1270b3a9a4155afdb5334/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala#L706-L760