-
Notifications
You must be signed in to change notification settings - Fork 258
deps: [iceberg] upgrade DataFusion to 51, Arrow to 57, Iceberg to latest, MSRV to 1.88 #2729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…date. Remove datafusion-sql dependency to improve build times.
|
Marking as draft since this is just for testing until DataFusion 51.0.0 crates are available. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2729 +/- ##
============================================
+ Coverage 56.12% 59.35% +3.23%
- Complexity 976 1369 +393
============================================
Files 119 167 +48
Lines 11743 15334 +3591
Branches 2251 2545 +294
============================================
+ Hits 6591 9102 +2511
- Misses 4012 4945 +933
- Partials 1140 1287 +147 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
# Conflicts: # native/Cargo.lock
|
DF 51.0.0 crates are available, but we're currently planning to review and merge this after the Comet 0.12.0 release. Keeping this as a draft for now. |
# Conflicts: # native/Cargo.lock # native/Cargo.toml
|
@mbutrovich are you planning to continue on this? |
Yep, we're blocked on apache/iceberg-rust#1899 |
# Conflicts: # native/Cargo.lock
|
apache/iceberg-rust#1921 merged so I'm hoping CI will go green now and we can get this reviewed. |
…ed columns (#1922) ## Which issue does this PR close? See #1824 (comment) and #1914 (comment) ## What changes are included in this PR? This restores the behavior in `record_batch_transformer.rs`'s `constants_map` function to pre-#1824 behavior where `NULL`s are not inserted into the constants map, and instead are just skipped. This allows the column projection rules for missing partition values to default to `NULL`. ## Are these changes tested? New test, and running the entire Iceberg Java suite via DataFusion Comet in apache/datafusion-comet#2729.
# Conflicts: # native/spark-expr/src/math_funcs/internal/make_decimal.rs
| file_source, | ||
| ) | ||
| .with_projection(Some(projection_vector)) | ||
| .with_projection_indices(Some(projection_vector)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI It was an issue with Q13 projection pushdown with this method
| Decimal128(precision, scale) => { | ||
| // In the spark, the result type is DECIMAL(min(38,precision+4), min(38,scale+4)). | ||
| // Ref: https://github.com/apache/spark/blob/fcf636d9eb8d645c24be3db2d599aba2d7e2955a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala#L66 | ||
| let new_precision = DECIMAL128_MAX_PRECISION.min(*precision + 4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
| */ | ||
| private def partitionValueToJson(fieldTypeStr: String, value: Any): JValue = { | ||
| fieldTypeStr match { | ||
| case t if t.startsWith("timestamp") => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any chance type name can be case sensitive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From looking at the Iceberg code, doesn't look like it. They're all lowercase.
| // Schema selection logic: | ||
| // 1. If hasDeletes=true: Use taskSchema (file-specific schema) because | ||
| // delete files reference specific schema versions and we need exact schema | ||
| // matching for MOR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // matching for MOR. | |
| // matching for merge-on-read. |
comphead
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mbutrovich epic PR!
|
I ran local benchmarks with scan modes |
Which issue does this PR close?
Closes #2719.
Rationale for this change
What changes are included in this PR?
CometIcebergNativeScanto consolidate redundant codeHow are these changes tested?
Existing tests (including Iceberg suite), plus two new
CometIcebergNativeSuitetests I added while debugging issues upgrading.