Skip to content

[AURON #2250] Add native Hudi scan coverage for partitioned COW tables#2251

Merged
slfan1989 merged 1 commit into
apache:masterfrom
weimingdiit:test/hudi-partitioned-cow-native-scan
May 17, 2026
Merged

[AURON #2250] Add native Hudi scan coverage for partitioned COW tables#2251
slfan1989 merged 1 commit into
apache:masterfrom
weimingdiit:test/hudi-partitioned-cow-native-scan

Conversation

@weimingdiit
Copy link
Copy Markdown
Contributor

@weimingdiit weimingdiit commented May 10, 2026

Which issue does this PR close?

Closes #2250

Rationale for this change

Partitioned COW Hudi tables are a common use case, but native Hudi scan coverage currently focuses on simple non-partitioned COW tables and fallback cases. Adding partitioned table coverage helps catch regressions around partition pruning and mixed partition/data filters.

What changes are included in this PR?

Add a partitioned COW Parquet Hudi table test.

  • Verify full scan query result correctness.
  • Verify partition pruning with a partition filter.
  • Verify combined partition filter and data filter.
  • Verify the Hudi convert provider can convert the scan to native Parquet scan.

Are there any user-facing changes?

No. This PR only adds test coverage.

How was this patch tested?

  • Added Hudi scan support suite coverage.

@weimingdiit weimingdiit marked this pull request as ready for review May 12, 2026 02:12
@slfan1989 slfan1989 requested a review from Copilot May 13, 2026 01:42
@slfan1989
Copy link
Copy Markdown
Contributor

@weimingdiit I've taken a quick look and +1. Let's check Copilot's review results.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds additional native scan test coverage for Apache Hudi partitioned COW (Copy-on-Write) Parquet tables in the Auron Spark integration test suite, targeting regressions around partition pruning and mixed partition/data filter handling.

Changes:

  • Adds a new test that creates a partitioned Hudi COW table, inserts data across partitions, and validates query correctness.
  • Verifies provider-based conversion to native Parquet scan for: full scan, partition-filtered scan, and combined partition+data filtered scan.
  • Asserts partition filters and data filters are routed to the expected FileSourceScanExec filter buckets.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@weimingdiit weimingdiit force-pushed the test/hudi-partitioned-cow-native-scan branch 3 times, most recently from 5ecdfe5 to 32e4d9b Compare May 14, 2026 14:43
… tables

Signed-off-by: weimingdiit <weimingdiit@gmail.com>
@weimingdiit weimingdiit force-pushed the test/hudi-partitioned-cow-native-scan branch from 32e4d9b to 1a76d59 Compare May 15, 2026 03:46
@slfan1989 slfan1989 merged commit f05c9da into apache:master May 17, 2026
123 checks passed
@slfan1989
Copy link
Copy Markdown
Contributor

@weimingdiit Thanks for the contribution! Merged. @yew1eb Thanks for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add native Hudi scan coverage for partitioned COW tables

4 participants