[AMORO-4044] Fix partition mapping bug in TableEntriesScan for tables with evolved PartitionSpec#4085
Conversation
|
Hi @majin1102 @zhoujinsong, could you please review this PR? This fix addresses a partition mapping bug in |
… with evolved PartitionSpec The entries metadata table returns a unified super-struct partition containing fields from all PartitionSpecs. buildDataFile() and buildDeleteFile() passed this directly to withPartition(), causing partition mismatch after spec evolution. Fix by projecting the unified partition to the spec-specific type using StructProjection. Signed-off-by: Jiwon Park <jpark92@outlook.kr>
61c7c21 to
7c4f02b
Compare
|
We encountered the same issue internally and have tested this branch — all our internal tests passed. LGTM from the testing perspective. One minor pre-existing issue I noticed (not introduced by this PR): the |
|
Hi @lintingbin, would you mind reviewing this PR? It addresses the partition mismatch bug reported in #4044 — specifically fixing Thanks! |
|
#4047 This PR is the fix proposed by our colleague, but I feel your solution is a better approach. We've already tested it internally on your branch, and everything looks good. I think it's ready to merge. @xxubai @zhoujinsong Please take a look at this. |
| return lazyIndexOfDataFileType.get(fieldName); | ||
| } | ||
|
|
||
| private StructLike projectPartition(PartitionSpec spec, StructLike partition) { |
There was a problem hiding this comment.
It seems this function duplicates the logic in PartitionUtil.coercePartition(...). Maybe we can reuse that method directly.
There was a problem hiding this comment.
Thanks for the suggestion! At first glance they do look similar, but the projection direction is actually reversed.
coercePartition projects spec-specific → unified (input: spec.partitionType(), output: partitionType):
StructProjection.createAllowMissing(spec.partitionType(), partitionType);projectPartition here projects unified → spec-specific (input: unifiedPartitionType(), output: spec.partitionType()):
StructProjection.createAllowMissing(unifiedPartitionType(), spec.partitionType());In TableEntriesScan, the partition data is read from the entries metadata table, which uses a unified partition schema (Partitioning.partitionType(table)). But DataFiles.builder(spec).withPartition() expects the spec-specific layout. So we need the unified → spec direction, which is the opposite of what coercePartition does.
| StructLike partition = | ||
| fileRecord.get(dataFileFieldIndex(DataFile.PARTITION_NAME), StructLike.class); | ||
| builder.withPartition(partition); | ||
| builder.withPartition(projectPartition(spec, partition)); |
There was a problem hiding this comment.
| builder.withPartition(projectPartition(spec, partition)); | |
| builder.withPartition(PartitionUtil.coercePartition(unifiedPartitionType(), spec, partition)); |
| StructLike partition = | ||
| fileRecord.get(dataFileFieldIndex(DataFile.PARTITION_NAME), StructLike.class); | ||
| builder.withPartition(partition); | ||
| builder.withPartition(projectPartition(spec, partition)); |
There was a problem hiding this comment.
| builder.withPartition(projectPartition(spec, partition)); | |
| builder.withPartition(PartitionUtil.coercePartition(unifiedPartitionType(), spec, partition)); |
| return lazyIndexOfDataFileType.get(fieldName); | ||
| } | ||
|
|
||
| private StructLike projectPartition(PartitionSpec spec, StructLike partition) { |
Why are the changes needed?
The entries metadata table returns a unified super-struct partition containing fields from all PartitionSpecs. In
TableEntriesScan, bothbuildDataFile()andbuildDeleteFile()passed this unified partition directly towithPartition(), which causes a partition struct mismatch for files written under a non-latest PartitionSpec. This is the root cause of #4044.This PR fixes the issue at the
TableEntriesScanlevel by projecting the unified partition to the spec-specific partition type usingStructProjection.createAllowMissing(), so that all callers ofTableEntriesScanget correct partition values regardless of spec evolution.Close #4044.
Brief change log
projectPartition()method that usesStructProjection.createAllowMissing()to project the unified partition struct to the file's spec-specific partition typebuildDataFile()andbuildDeleteFile()Partitioning.partitionType(table)for reuseHow was this patch tested?
Documentation
no
not applicable