Skip to content

Use concat_elements_dyn from arrow-rs#23211

Draft
pepijnve wants to merge 1 commit into
apache:mainfrom
pepijnve:string_concat
Draft

Use concat_elements_dyn from arrow-rs#23211
pepijnve wants to merge 1 commit into
apache:mainfrom
pepijnve:string_concat

Conversation

@pepijnve

@pepijnve pepijnve commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

apache/arrow-rs#9876 added ByteView and FixedSizeBinary support to concat_elements_dyn in arrow-rs. As a consequence the extended implementation in DataFusion can now be replaced by a call to the arrow-rs implementation.

What changes are included in this PR?

  • Remove the kernels::concat_elements_utf8view and kernels::concat_elements_binary_view_array
  • Replace implementation of binary::concat_elements with a call to arrow::compute::kernels::concat_elements::concat_elements_dyn

Are these changes tested?

  • Usage is covered by existing tests
  • The kernels themselves are tested in arrow-rs

Are there any user-facing changes?

No, two pub functions have been removed from kernel, but kernel itself is not pub.

@github-actions github-actions Bot added the physical-expr Changes to the physical-expr crates label Jun 26, 2026
@pepijnve

Copy link
Copy Markdown
Contributor Author

@alamb I kept this as draft for now, but while reviewing the changes I noticed the implementations from DataFusion are almost identical to the versions in the current release of arrow-rs. Could you run the string_concat benchmark? If there is no performance difference, perhaps this can already be merged.

@pepijnve

pepijnve commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

On further inspection it turns out that apache/arrow-rs#9876 added the missing parts to concat_elements_dyn that DataFusion had. I've broadened this PR a bit to remove the custom concat_elements_dyn entirely instead. The one in arrow is actually more capable at this point since it also supports FixedBinary.

@pepijnve pepijnve changed the title Use concat_elements_binary_view_array and concat_elements_string_view_array from arrow-rs Use concat_elements_dyn from arrow-rs Jun 26, 2026
@alamb

alamb commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

DataFusion had. I've broadened this PR a bit to remove the custom concat_elements_dyn entirely instead. The one in arrow is actually more capable at this point since it also supports FixedBinary.

The plan is working!

@alamb

alamb commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

run benchmarks string_concat

@adriangbot

Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4812851016-718-42bd4 6.12.85+ #1 SMP Mon May 11 08:17:35 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing string_concat (b1d3f05) to 1fd29c9 (merge-base) diff using: string_concat
Results will be posted here when complete


File an issue against this benchmark runner

@alamb

alamb commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

(I will be pretty stoked if we get more features and faster performance by deleting code 😎 )

@adriangbot

Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                              HEAD                                   string_concat
-----                              ----                                   -------------
concat_utf8view/concat/nulls_0     1.00     59.5±0.12µs        ? ?/sec    1.35     80.6±2.29µs        ? ?/sec
concat_utf8view/concat/nulls_10    1.00     67.2±0.10µs        ? ?/sec    1.30     87.5±0.38µs        ? ?/sec
concat_utf8view/concat/nulls_50    1.00     32.6±0.10µs        ? ?/sec    1.20     39.1±0.17µs        ? ?/sec

Resource Usage

string_concat — base (merge-base)

Metric Value
Wall time 240.1s
Peak memory 25.9 MiB
Avg memory 2.3 MiB
CPU user 35.9s
CPU sys 0.0s
Peak spill 0 B

string_concat — branch

Metric Value
Wall time 245.1s
Peak memory 29.1 MiB
Avg memory 2.7 MiB
CPU user 37.0s
CPU sys 0.0s
Peak spill 0 B

File an issue against this benchmark runner

@pepijnve

Copy link
Copy Markdown
Contributor Author

more features

I was a little bit too optimistic there. While the arrow implementation can in theory do FixedBinary(n) || FixedBinary(m) => FixedBinary(n + m), there's a left.data_type() == right.data_type() guard that gets in the way of actually doing so. More PRs to prepare.

string_concat_coercion prevents hitting that code path at the moment by coercing the two FixedBinary types to variable length Binary.

@pepijnve

Copy link
Copy Markdown
Contributor Author

Benchmark results do show a significant slowdown, so there must be something I overlooked. Let's wait for the arrow changes to land and then reevaluate.

@pepijnve

Copy link
Copy Markdown
Contributor Author

I did the experiment of running the benchmark with the concat_elements_dyn implementation from arrow-rs main. That shows the speedup we were aiming for rather than a regression.

Jefffrey pushed a commit to apache/arrow-rs that referenced this pull request Jun 27, 2026
…yn` (#10222)

# Which issue does this PR close?

None; relates to apache/datafusion#23211

# Rationale for this change

`concat_elements_fixed_size_binary` supports concatenation of
`FixedSizeBinary(n)` and `FixedSizeBinary(m)`, but the guard clause in
`concat_elements_dyn` prevents this from actually being possible with
`dyn Array`.

# What changes are included in this PR?

Adjust the guard clause in `concat_elements_dyn` to allow concatenation
of mixed `FixedSizeBinary` types.

# Are these changes tested?

- Added an extra test case for mixed `FixedSizeBinary` specifically
- Adjusted the existing unit tests to use `concat_elements_dyn`. This
maintains coverage of the functions that were being called (since
they're still called indirectly) while increasing the coverage
`concat_elements_dyn`

# Are there any user-facing changes?

Yes, the pre conditions of the function are relaxed. This should not be
a breaking change.
@pepijnve

Copy link
Copy Markdown
Contributor Author

#21883 introduced string_concat operand coercion specifically for FixedSizeBinary to Binary. This causes DataFusion to not use the FixedSizeBinary code path that's now enabled by apache/arrow-rs#10222. Is there a reason to keep this or would it be preferable to make || return FixedSizeBinary(l + r)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replace custom ByteView concat kernel with the implementation from arrow-rs

3 participants