Use concat_elements_dyn from arrow-rs#23211
Conversation
|
@alamb I kept this as draft for now, but while reviewing the changes I noticed the implementations from DataFusion are almost identical to the versions in the current release of |
|
On further inspection it turns out that apache/arrow-rs#9876 added the missing parts to |
concat_elements_binary_view_array and concat_elements_string_view_array from arrow-rsconcat_elements_dyn from arrow-rs
The plan is working! |
|
run benchmarks string_concat |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing string_concat (b1d3f05) to 1fd29c9 (merge-base) diff using: string_concat File an issue against this benchmark runner |
|
(I will be pretty stoked if we get more features and faster performance by deleting code 😎 ) |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagestring_concat — base (merge-base)
string_concat — branch
File an issue against this benchmark runner |
I was a little bit too optimistic there. While the arrow implementation can in theory do
|
|
Benchmark results do show a significant slowdown, so there must be something I overlooked. Let's wait for the arrow changes to land and then reevaluate. |
|
I did the experiment of running the benchmark with the |
…ntation of the same function
…yn` (#10222) # Which issue does this PR close? None; relates to apache/datafusion#23211 # Rationale for this change `concat_elements_fixed_size_binary` supports concatenation of `FixedSizeBinary(n)` and `FixedSizeBinary(m)`, but the guard clause in `concat_elements_dyn` prevents this from actually being possible with `dyn Array`. # What changes are included in this PR? Adjust the guard clause in `concat_elements_dyn` to allow concatenation of mixed `FixedSizeBinary` types. # Are these changes tested? - Added an extra test case for mixed `FixedSizeBinary` specifically - Adjusted the existing unit tests to use `concat_elements_dyn`. This maintains coverage of the functions that were being called (since they're still called indirectly) while increasing the coverage `concat_elements_dyn` # Are there any user-facing changes? Yes, the pre conditions of the function are relaxed. This should not be a breaking change.
|
#21883 introduced string_concat operand coercion specifically for |
Which issue does this PR close?
Rationale for this change
apache/arrow-rs#9876 added
ByteViewandFixedSizeBinarysupport toconcat_elements_dyninarrow-rs. As a consequence the extended implementation in DataFusion can now be replaced by a call to thearrow-rsimplementation.What changes are included in this PR?
kernels::concat_elements_utf8viewandkernels::concat_elements_binary_view_arraybinary::concat_elementswith a call toarrow::compute::kernels::concat_elements::concat_elements_dynAre these changes tested?
arrow-rsAre there any user-facing changes?
No, two
pubfunctions have been removed fromkernel, butkernelitself is notpub.