Remove UDAF manual Debug impls and simplify signatures #19727

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

Jefffrey merged 1 commit into apache:main from Jefffrey:refactor-udaf-sigs

Jan 21, 2026

Contributor

Jefffrey commented Jan 10, 2026

Which issue does this PR close?

Part of Refactor away usage of NUMERICS/INTEGERS in datafusion/expr-common/src/type_coercion/aggregates.rs #18092

Rationale for this change

Main value add here is ensure UDAFs encode their actual accepted types in their signature instead of internally casting to the actual types they support from a wider signature. Also doing some driveby refactoring of removing manual Debug impls.

What changes are included in this PR?

See rationale.

Are these changes tested?

Existing tests.

Are there any user-facing changes?

No.


          Remove UDAF manual Debug impls and simplify signatures

dc5a922

github-actions bot added the functions label

Jefffrey commented

View reviewed changes

datafusion/functions-aggregate/src/approx_percentile_cont_with_weight.rs

    
                  approx_percentile_cont: ApproxPercentileCont,

              }

              impl Debug for ApproxPercentileContWithWeight {

Contributor Author

Jefffrey Jan 10, 2026

The only advantage I see to these manual impls is sometimes they add the name field; I personally don't see those as useful enough to need a separate impl so opting for derive to remove code where possible.

datafusion/functions-aggregate/src/covariance.rs

Comment on lines +82 to +85

    
                          signature: Signature::exact(

                              vec![DataType::Float64, DataType::Float64],

                              Volatility::Immutable,

                          ),

Contributor Author

Jefffrey Jan 10, 2026

This way type coercion handles casting for us; we can now remove the code that does casting internally in the accumulators for us

datafusion/functions-aggregate/src/covariance.rs

    
                  }

                  fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> {

                      if !arg_types[0].is_numeric() {

Contributor Author

Jefffrey Jan 10, 2026

We should be more confident in our signature code to guard this for us, to promote consistency across our UDFs

datafusion/functions-aggregate/src/covariance.rs

    
                              arr2.next()

                          } else {

                              None

                      for (value1, value2) in values1.iter().zip(values2) {

Contributor Author

Jefffrey Jan 10, 2026

Little driveby refactor to make iteration cleaner

datafusion/functions-aggregate/src/covariance.rs

    
                      let means1 = downcast_value!(states[1], Float64Array);

                      let means2 = downcast_value!(states[2], Float64Array);

                      let cs = downcast_value!(states[3], Float64Array);

                      let counts = as_uint64_array(&states[0])?;

Contributor Author

Jefffrey Jan 10, 2026

I started using as_float64_array above so decided to make the whole file consistent in which way it handles downcasting

Jefffrey marked this pull request as ready for review

January 10, 2026 08:08

alamb approved these changes

View reviewed changes

Contributor

alamb left a comment

Thank you @Jefffrey -- this looks like a really nice cleanup to me

datafusion/functions-aggregate/src/covariance.rs

    
                          if value1.is_none() || value2.is_none() {

                              continue;

                          }

                      let values1 = as_float64_array(&values[0])?;

Contributor

alamb Jan 20, 2026

this is certainly much nicer

datafusion/functions-aggregate/src/covariance.rs

    
                          let value1 = unwrap_or_internal_err!(value1);

                          let value2 = unwrap_or_internal_err!(value2);

                      for (value1, value2) in values1.iter().zip(values2) {

Contributor

alamb Jan 20, 2026

This is much nicer

datafusion/functions-aggregate/src/variance.rs

    
                  ) -> Result<()> {

                      assert_eq!(values.len(), 3, "two arguments to merge_batch");

                      // first batch is counts, second is partial means, third is partial m2s

                      let partial_counts = downcast_value!(values[0], UInt64Array);

Contributor

alamb Jan 20, 2026

I wonder if we should deprecate downcast value and direct people to the as_* methods ? It seems strange to have two ways to do something

Contributor Author

Jefffrey Jan 21, 2026

I have definitely found it a little confusing all the ways we can do downcasting, for example:

While they all do have slightly different behaviours (with how they treat erroring and such), it would help to be consistent 🤔

We could make downcast_value! deprecated and eventually try to make it private to common

Contributor

alamb Jan 21, 2026

Yeah the biggest difference as I recall is taht ones in arrow panic, and the ones in DataFusion retun errors

Jefffrey added this pull request to the merge queue

Merged via the queue into apache:main with commit 10a1d4e

28 checks passed

Contributor Author

Jefffrey commented Jan 21, 2026

Thanks @alamb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels