Skip to content

perf: Optimize array_to_string(), support more types#20553

Open
neilconway wants to merge 10 commits intoapache:mainfrom
neilconway:neilc/optimize-array-to-string
Open

perf: Optimize array_to_string(), support more types#20553
neilconway wants to merge 10 commits intoapache:mainfrom
neilconway:neilc/optimize-array-to-string

Conversation

@neilconway
Copy link
Contributor

@neilconway neilconway commented Feb 25, 2026

Which issue does this PR close?

Rationale for this change

array_to_string did a lot of unnecessary allocations. Rewriting the function to avoid those allocations yields around a 50-75% performance improvement.

What changes are included in this PR?

  • Add benchmark for array_to_string
  • Move nested functions to top-level
  • Get rid of some unnecessary macros
  • Borrow instead of cloning on recursion
  • Reuse a single String buffer across rows via buf.clear(), instead of allocating fresh
  • Write directly to the output buffer, instead of allocating via x.to_string() + push_str
  • Specialized logic for writing values of different types; this adds a dependency on the itoa crate, but it yields significant speedups over a generic write! approach
  • Add support for arrays of decimal and all the datetime types
  • Improve docs

Are these changes tested?

Yes, and benchmarked.

Are there any user-facing changes?

No.

@github-actions github-actions bot added documentation Improvements or additions to documentation sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Feb 25, 2026
@neilconway
Copy link
Contributor Author

neilconway commented Feb 25, 2026

Note to reviewers: it might be easier to just read the new code, rather than deciphering the diff. I had to rewrite most of the existing logic.

Also, this PR includes the commit from #20536 to avoid merge conflicts down the road.

@neilconway
Copy link
Contributor Author

Benchmarks:

  ┌──────────────────┬──────────┬───────────┬─────────────┐
  │    Benchmark     │ Original │ Optimized │ Improvement │
  ├──────────────────┼──────────┼───────────┼─────────────┤
  │ int64/5          │ 221.8 µs │ 94.0 µs   │ -57.7%      │
  ├──────────────────┼──────────┼───────────┼─────────────┤
  │ int64/20         │ 544.7 µs │ 245.2 µs  │ -55.2%      │
  ├──────────────────┼──────────┼───────────┼─────────────┤
  │ int64/100        │ 1972 µs  │ 1009 µs   │ -49.0%      │
  ├──────────────────┼──────────┼───────────┼─────────────┤
  │ string/5         │ 242.1 µs │ 63.9 µs   │ -73.8%      │
  ├──────────────────┼──────────┼───────────┼─────────────┤
  │ string/20        │ 521.1 µs │ 133.6 µs  │ -74.5%      │
  ├──────────────────┼──────────┼───────────┼─────────────┤
  │ string/100       │ 1773 µs  │ 502.2 µs  │ -71.7%      │
  ├──────────────────┼──────────┼───────────┼─────────────┤
  │ nested_int64/5   │ 197.0 µs │ 105.1 µs  │ -46.5%      │
  ├──────────────────┼──────────┼───────────┼─────────────┤
  │ nested_int64/20  │ 468.9 µs │ 254.0 µs  │ -45.8%      │
  ├──────────────────┼──────────┼───────────┼─────────────┤
  │ nested_int64/100 │ 1847 µs  │ 1018 µs   │ -45.1%      │
  └──────────────────┴──────────┴───────────┴─────────────┘

@neilconway
Copy link
Contributor Author

Updated benchmarks; I added special-cases for integers and floats, and also added a benchmark for floats:

  ┌──────────────┬────────────┬─────────┬───────────┬─────────────┐
  │  Benchmark   │ Array Size │  Main   │ Optimized │ Improvement │
  ├──────────────┼────────────┼─────────┼───────────┼─────────────┤
  │ int64        │ 5          │ 229 µs  │ 80 µs     │ -65%        │
  ├──────────────┼────────────┼─────────┼───────────┼─────────────┤
  │ int64        │ 20         │ 546 µs  │ 155 µs    │ -72%        │
  ├──────────────┼────────────┼─────────┼───────────┼─────────────┤
  │ int64        │ 100        │ 1.94 ms │ 614 µs    │ -68%        │
  ├──────────────┼────────────┼─────────┼───────────┼─────────────┤
  │ float64      │ 5          │ 617 µs  │ 132 µs    │ -79%        │
  ├──────────────┼────────────┼─────────┼───────────┼─────────────┤
  │ float64      │ 20         │ 1.91 ms │ 431 µs    │ -77%        │
  ├──────────────┼────────────┼─────────┼───────────┼─────────────┤
  │ float64      │ 100        │ 8.17 ms │ 2.05 ms   │ -75%        │
  ├──────────────┼────────────┼─────────┼───────────┼─────────────┤
  │ string       │ 5          │ 254 µs  │ 68 µs     │ -73%        │
  ├──────────────┼────────────┼─────────┼───────────┼─────────────┤
  │ string       │ 20         │ 532 µs  │ 157 µs    │ -71%        │
  ├──────────────┼────────────┼─────────┼───────────┼─────────────┤
  │ string       │ 100        │ 1.79 ms │ 521 µs    │ -71%        │
  ├──────────────┼────────────┼─────────┼───────────┼─────────────┤
  │ nested int64 │ 5          │ 206 µs  │ 80 µs     │ -61%        │
  ├──────────────┼────────────┼─────────┼───────────┼─────────────┤
  │ nested int64 │ 20         │ 468 µs  │ 165 µs    │ -65%        │
  ├──────────────┼────────────┼─────────┼───────────┼─────────────┤
  │ nested int64 │ 100        │ 1.83 ms │ 627 µs    │ -66%        │
  └──────────────┴────────────┴─────────┴───────────┴─────────────┘

@neilconway
Copy link
Contributor Author

I think adding dependencies on itoa and ryu is justified, because the incremental speedup is significant (e.g., 1009 -> 614 us for int64/100). We could potentially use these crates elsewhere (e.g., format_string), but I'll worry about that later.

@neilconway
Copy link
Contributor Author

neilconway commented Feb 25, 2026

So it turns out that when using ryu, the output differs in a few cosmetic ways from how Rust's default formatting displays floats. For example, ryu returns "1.0" for the floating point number 1, whereas previously we returned "1". There will presumably be other differences in various other FP corner cases (e.g., scientific notation). Ryu's output is not "wrong" (it claims to emit "shortest decimal representation that round-trips correctly"), but it's different than what we've historically displayed.

For the time being, I've reverted the change to use ryu; we now display floating point numbers using write!. That is significantly slower (roughly 2x) for floats, but still much better than the original implementation. I'm curious if anyone has a point of view on whether using ryu is the right approach here or not.

Updated benchmark numbers:

  ┌──────────────┬────────────┬─────────┬───────────┬─────────────┐
  │  Benchmark   │ Array Size │  Main   │ Optimized │ Improvement │
  ├──────────────┼────────────┼─────────┼───────────┼─────────────┤
  │ int64        │ 5          │ 229 µs  │ 72 µs     │ -69%        │
  ├──────────────┼────────────┼─────────┼───────────┼─────────────┤
  │ int64        │ 20         │ 546 µs  │ 159 µs    │ -71%        │
  ├──────────────┼────────────┼─────────┼───────────┼─────────────┤
  │ int64        │ 100        │ 1.94 ms │ 620 µs    │ -68%        │
  ├──────────────┼────────────┼─────────┼───────────┼─────────────┤
  │ float64      │ 5          │ 617 µs  │ 266 µs    │ -57%        │
  ├──────────────┼────────────┼─────────┼───────────┼─────────────┤
  │ float64      │ 20         │ 1.91 ms │ 909 µs    │ -52%        │
  ├──────────────┼────────────┼─────────┼───────────┼─────────────┤
  │ float64      │ 100        │ 8.17 ms │ 4.32 ms   │ -47%        │
  ├──────────────┼────────────┼─────────┼───────────┼─────────────┤
  │ string       │ 5          │ 254 µs  │ 65 µs     │ -75%        │
  ├──────────────┼────────────┼─────────┼───────────┼─────────────┤
  │ string       │ 20         │ 532 µs  │ 128 µs    │ -76%        │
  ├──────────────┼────────────┼─────────┼───────────┼─────────────┤
  │ string       │ 100        │ 1.79 ms │ 486 µs    │ -73%        │
  ├──────────────┼────────────┼─────────┼───────────┼─────────────┤
  │ nested int64 │ 5          │ 206 µs  │ 81 µs     │ -60%        │
  ├──────────────┼────────────┼─────────┼───────────┼─────────────┤
  │ nested int64 │ 20         │ 468 µs  │ 166 µs    │ -65%        │
  ├──────────────┼────────────┼─────────┼───────────┼─────────────┤
  │ nested int64 │ 100        │ 1.83 ms │ 626 µs    │ -66%        │
  └──────────────┴────────────┴─────────┴───────────┴─────────────┘

@neilconway neilconway changed the title perf: Optimize array_to_string(), support decimal arrays perf: Optimize array_to_string(), support more types Feb 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

array_to_string() does not support decimal arrays Optimize array_to_string to avoid unnecessary allocations

2 participants