Skip to content

[Draft] Export partition to apache iceberg#1618

Open
arthurpassos wants to merge 31 commits intoantalya-26.1from
export_partition_iceberg
Open

[Draft] Export partition to apache iceberg#1618
arthurpassos wants to merge 31 commits intoantalya-26.1from
export_partition_iceberg

Conversation

@arthurpassos
Copy link
Copy Markdown
Collaborator

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

...

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 6, 2026

Workflow [PR], commit [6a19f92]

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5b0e833565

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

{
auto * object_storage = dynamic_cast<StorageObjectStorageCluster *>(dest_storage.get());

auto * iceberg_metadata = dynamic_cast<IcebergMetadata *>(object_storage->getExternalMetadata(query_context));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard IcebergMetadata casts with USE_AVRO

IcebergMetadata is defined only under #if USE_AVRO in IcebergMetadata.h, but this new dynamic_cast<IcebergMetadata *> is compiled unconditionally. In non-AVRO builds (USE_AVRO=0), this translation unit (and the same pattern added in StorageReplicatedMergeTree.cpp) cannot compile, so the change breaks optional-AVRO build configurations.

Useful? React with 👍 / 👎.

Comment on lines +1780 to +1781
const String sidecar_path = replaceFileExtensionWithAvro(
filename_generator.convertMetadataPathToStoragePath(path));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use storage paths directly when reading export sidecars

The export path list is populated from filename.path_in_storage, but commit now treats each entry as a metadata path and calls convertMetadataPathToStoragePath before reading sidecars. With write_full_path_in_iceberg_metadata=1, table_dir is an URI prefix (for example s3://...) while these entries are plain storage paths (/...), so the conversion throws and EXPORT PARTITION cannot commit.

Useful? React with 👍 / 👎.

@arthurpassos
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6c6194814d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +6574 to +6577
if (!dest_storage->isDataLake())
{
if (query_to_string(source_metadata_ptr->getPartitionKeyAST()) != query_to_string(destination_metadata_ptr->getPartitionKeyAST()))
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Tables have different partition key");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore partition-spec validation for direct Iceberg export

Skipping the partition-key check for all data lakes here removes the only compatibility guard in the EXPORT PART path, but no replacement validation is added (unlike EXPORT PARTITION). When the source part’s partition tuple does not match the destination Iceberg spec (for example, source unpartitioned but destination partitioned), partition_values can be empty or shorter than the destination partition field list, and generateManifestFile later indexes partition_values[i] for each destination field, which can crash or corrupt the commit attempt.

Useful? React with 👍 / 👎.

Comment on lines +8390 to +8391
else
pv_arr->add(field.safeGet<Int64>());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve non-integer partition values in manifest JSON

This serialization path coerces every non-string partition value to Int64, which breaks valid Iceberg partition values such as Float64, Decimal*, and other non-integer Field types. In those cases export can throw during serialization (bad safeGet<Int64>()) or lose value fidelity before commit, even though the downstream Iceberg writer supports these types. As a result, partition exports to Iceberg fail or write incorrect partition metadata for non-integer partition transforms.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants