[Draft] Export partition to apache iceberg#1618
[Draft] Export partition to apache iceberg#1618arthurpassos wants to merge 31 commits intoantalya-26.1from
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5b0e833565
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| { | ||
| auto * object_storage = dynamic_cast<StorageObjectStorageCluster *>(dest_storage.get()); | ||
|
|
||
| auto * iceberg_metadata = dynamic_cast<IcebergMetadata *>(object_storage->getExternalMetadata(query_context)); |
There was a problem hiding this comment.
Guard IcebergMetadata casts with USE_AVRO
IcebergMetadata is defined only under #if USE_AVRO in IcebergMetadata.h, but this new dynamic_cast<IcebergMetadata *> is compiled unconditionally. In non-AVRO builds (USE_AVRO=0), this translation unit (and the same pattern added in StorageReplicatedMergeTree.cpp) cannot compile, so the change breaks optional-AVRO build configurations.
Useful? React with 👍 / 👎.
| const String sidecar_path = replaceFileExtensionWithAvro( | ||
| filename_generator.convertMetadataPathToStoragePath(path)); |
There was a problem hiding this comment.
Use storage paths directly when reading export sidecars
The export path list is populated from filename.path_in_storage, but commit now treats each entry as a metadata path and calls convertMetadataPathToStoragePath before reading sidecars. With write_full_path_in_iceberg_metadata=1, table_dir is an URI prefix (for example s3://...) while these entries are plain storage paths (/...), so the conversion throws and EXPORT PARTITION cannot commit.
Useful? React with 👍 / 👎.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6c6194814d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (!dest_storage->isDataLake()) | ||
| { | ||
| if (query_to_string(source_metadata_ptr->getPartitionKeyAST()) != query_to_string(destination_metadata_ptr->getPartitionKeyAST())) | ||
| throw Exception(ErrorCodes::BAD_ARGUMENTS, "Tables have different partition key"); |
There was a problem hiding this comment.
Restore partition-spec validation for direct Iceberg export
Skipping the partition-key check for all data lakes here removes the only compatibility guard in the EXPORT PART path, but no replacement validation is added (unlike EXPORT PARTITION). When the source part’s partition tuple does not match the destination Iceberg spec (for example, source unpartitioned but destination partitioned), partition_values can be empty or shorter than the destination partition field list, and generateManifestFile later indexes partition_values[i] for each destination field, which can crash or corrupt the commit attempt.
Useful? React with 👍 / 👎.
| else | ||
| pv_arr->add(field.safeGet<Int64>()); |
There was a problem hiding this comment.
Preserve non-integer partition values in manifest JSON
This serialization path coerces every non-string partition value to Int64, which breaks valid Iceberg partition values such as Float64, Decimal*, and other non-integer Field types. In those cases export can throw during serialization (bad safeGet<Int64>()) or lose value fidelity before commit, even though the downstream Iceberg writer supports these types. As a result, partition exports to Iceberg fail or write incorrect partition metadata for non-integer partition transforms.
Useful? React with 👍 / 👎.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
...
Documentation entry for user-facing changes
...
CI/CD Options
Exclude tests:
Regression jobs to run: