-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Remove UDAF manual Debug impls and simplify signatures #19727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| approx_percentile_cont: ApproxPercentileCont, | ||
| } | ||
|
|
||
| impl Debug for ApproxPercentileContWithWeight { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only advantage I see to these manual impls is sometimes they add the name field; I personally don't see those as useful enough to need a separate impl so opting for derive to remove code where possible.
| signature: Signature::exact( | ||
| vec![DataType::Float64, DataType::Float64], | ||
| Volatility::Immutable, | ||
| ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way type coercion handles casting for us; we can now remove the code that does casting internally in the accumulators for us
| } | ||
|
|
||
| fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> { | ||
| if !arg_types[0].is_numeric() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be more confident in our signature code to guard this for us, to promote consistency across our UDFs
| arr2.next() | ||
| } else { | ||
| None | ||
| for (value1, value2) in values1.iter().zip(values2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Little driveby refactor to make iteration cleaner
| let means1 = downcast_value!(states[1], Float64Array); | ||
| let means2 = downcast_value!(states[2], Float64Array); | ||
| let cs = downcast_value!(states[3], Float64Array); | ||
| let counts = as_uint64_array(&states[0])?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started using as_float64_array above so decided to make the whole file consistent in which way it handles downcasting
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Jefffrey -- this looks like a really nice cleanup to me
| if value1.is_none() || value2.is_none() { | ||
| continue; | ||
| } | ||
| let values1 = as_float64_array(&values[0])?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is certainly much nicer
|
|
||
| let value1 = unwrap_or_internal_err!(value1); | ||
| let value2 = unwrap_or_internal_err!(value2); | ||
| for (value1, value2) in values1.iter().zip(values2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is much nicer
| ) -> Result<()> { | ||
| assert_eq!(values.len(), 3, "two arguments to merge_batch"); | ||
| // first batch is counts, second is partial means, third is partial m2s | ||
| let partial_counts = downcast_value!(values[0], UInt64Array); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should deprecate downcast value and direct people to the as_* methods ? It seems strange to have two ways to do something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have definitely found it a little confusing all the ways we can do downcasting, for example:
- https://docs.rs/datafusion/latest/datafusion/common/cast/fn.as_primitive_array.html
- https://docs.rs/datafusion/latest/datafusion/common/macro.downcast_value.html
- https://docs.rs/arrow/latest/arrow/array/trait.AsArray.html#method.as_primitive
- https://docs.rs/arrow/latest/arrow/array/trait.AsArray.html#tymethod.as_primitive_opt
- https://docs.rs/arrow/latest/arrow/array/fn.as_primitive_array.html
- Manual
downcast_ref
While they all do have slightly different behaviours (with how they treat erroring and such), it would help to be consistent 🤔
We could make downcast_value! deprecated and eventually try to make it private to common
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the biggest difference as I recall is taht ones in arrow panic, and the ones in DataFusion retun errors
|
Thanks @alamb |
Which issue does this PR close?
NUMERICS/INTEGERSindatafusion/expr-common/src/type_coercion/aggregates.rs#18092Rationale for this change
Main value add here is ensure UDAFs encode their actual accepted types in their signature instead of internally casting to the actual types they support from a wider signature. Also doing some driveby refactoring of removing manual Debug impls.
What changes are included in this PR?
See rationale.
Are these changes tested?
Existing tests.
Are there any user-facing changes?
No.