Skip to content

Conversation

@mbutrovich
Copy link
Contributor

@mbutrovich mbutrovich commented Nov 7, 2025

Which issue does this PR close?

Closes #2719.

Rationale for this change

What changes are included in this PR?

  • Bump DataFusion to 51 and adjust for API changes
  • Bump Arrow 57 and adjust for API changes
  • Bump Iceberg to latest now that it has DF 51 and Arrow 57
  • Bump Rust MSRV to 1.88
  • Refactor partition value serialization in CometIcebergNativeScan to consolidate redundant code

How are these changes tested?

Existing tests (including Iceberg suite), plus two new CometIcebergNativeSuite tests I added while debugging issues upgrading.

…date. Remove datafusion-sql dependency to improve build times.
@mbutrovich mbutrovich marked this pull request as draft November 7, 2025 16:45
@mbutrovich
Copy link
Contributor Author

Marking as draft since this is just for testing until DataFusion 51.0.0 crates are available.

@codecov-commenter
Copy link

codecov-commenter commented Nov 7, 2025

Codecov Report

❌ Patch coverage is 50.00000% with 32 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.35%. Comparing base (f09f8af) to head (de5ed17).
⚠️ Report is 753 commits behind head on main.

Files with missing lines Patch % Lines
.../comet/serde/operator/CometIcebergNativeScan.scala 47.54% 27 Missing and 5 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2729      +/-   ##
============================================
+ Coverage     56.12%   59.35%   +3.23%     
- Complexity      976     1369     +393     
============================================
  Files           119      167      +48     
  Lines         11743    15334    +3591     
  Branches       2251     2545     +294     
============================================
+ Hits           6591     9102    +2511     
- Misses         4012     4945     +933     
- Partials       1140     1287     +147     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mbutrovich mbutrovich changed the title chore: upgrade to DataFusion 51.0.0 and Arrow-rs 57.0.0 deps: upgrade to DataFusion 51.0.0 and Arrow-rs 57.0.0 Nov 7, 2025
@mbutrovich
Copy link
Contributor Author

mbutrovich commented Nov 19, 2025

DF 51.0.0 crates are available, but we're currently planning to review and merge this after the Comet 0.12.0 release. Keeping this as a draft for now.

# Conflicts:
#	native/Cargo.lock
#	native/Cargo.toml
@comphead
Copy link
Contributor

comphead commented Dec 8, 2025

@mbutrovich are you planning to continue on this?

@mbutrovich
Copy link
Contributor Author

@mbutrovich are you planning to continue on this?

Yep, we're blocked on apache/iceberg-rust#1899

@mbutrovich mbutrovich changed the title deps: upgrade to DataFusion 51.0.0 and Arrow-rs 57.0.0 deps: [iceberg] upgrade to DataFusion 51.0.0 and Arrow-rs 57.0.0 Dec 9, 2025
@mbutrovich mbutrovich changed the title deps: [iceberg] upgrade to DataFusion 51.0.0 and Arrow-rs 57.0.0 deps: [iceberg] upgrade to DataFusion 51 and Arrow-rs 57 Dec 9, 2025
@mbutrovich mbutrovich changed the title deps: [iceberg] upgrade to DataFusion 51 and Arrow-rs 57 deps: [iceberg] upgrade to DataFusion 51 and Arrow 57 Dec 9, 2025
@mbutrovich
Copy link
Contributor Author

apache/iceberg-rust#1921 merged so I'm hoping CI will go green now and we can get this reviewed.

@mbutrovich mbutrovich changed the title deps: [iceberg] upgrade to DataFusion 51 and Arrow 57 deps: [iceberg] upgrade DataFusion to 51, Arrow to 57, Iceberg to latest Dec 10, 2025
liurenjie1024 pushed a commit to apache/iceberg-rust that referenced this pull request Dec 11, 2025
…ed columns (#1922)

## Which issue does this PR close?

See
#1824 (comment)
and
#1914 (comment)

## What changes are included in this PR?

This restores the behavior in `record_batch_transformer.rs`'s
`constants_map` function to pre-#1824 behavior where `NULL`s are not
inserted into the constants map, and instead are just skipped. This
allows the column projection rules for missing partition values to
default to `NULL`.

## Are these changes tested?


New test, and running the entire Iceberg Java suite via DataFusion Comet
in apache/datafusion-comet#2729.
@mbutrovich mbutrovich marked this pull request as ready for review December 11, 2025 12:35
# Conflicts:
#	native/spark-expr/src/math_funcs/internal/make_decimal.rs
@mbutrovich mbutrovich added bug Something isn't working dependencies Pull requests that update a dependency file and removed bug Something isn't working labels Dec 11, 2025
@mbutrovich mbutrovich changed the title deps: [iceberg] upgrade DataFusion to 51, Arrow to 57, Iceberg to latest deps: [iceberg] upgrade DataFusion to 51, Arrow to 57, Iceberg to latest, MSRV to 1.88 Dec 11, 2025
file_source,
)
.with_projection(Some(projection_vector))
.with_projection_indices(Some(projection_vector))
Copy link
Contributor

@comphead comphead Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI It was an issue with Q13 projection pushdown with this method

apache/datafusion#19094 (review)

Decimal128(precision, scale) => {
// In the spark, the result type is DECIMAL(min(38,precision+4), min(38,scale+4)).
// Ref: https://github.com/apache/spark/blob/fcf636d9eb8d645c24be3db2d599aba2d7e2955a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala#L66
let new_precision = DECIMAL128_MAX_PRECISION.min(*precision + 4);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

*/
private def partitionValueToJson(fieldTypeStr: String, value: Any): JValue = {
fieldTypeStr match {
case t if t.startsWith("timestamp") =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any chance type name can be case sensitive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From looking at the Iceberg code, doesn't look like it. They're all lowercase.

// Schema selection logic:
// 1. If hasDeletes=true: Use taskSchema (file-specific schema) because
// delete files reference specific schema versions and we need exact schema
// matching for MOR.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// matching for MOR.
// matching for merge-on-read.

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mbutrovich epic PR!

@mbutrovich mbutrovich merged commit 97767c2 into apache:main Dec 11, 2025
172 of 173 checks passed
@andygrove
Copy link
Member

I ran local benchmarks with scan modes auto and native_datafusion and did not see any change from main branch

@mbutrovich mbutrovich deleted the df51 branch December 12, 2025 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Upgrade to DataFusion 51.0.0

4 participants