Skip to content

Conversation

@arthurpassos
Copy link
Collaborator

@arthurpassos arthurpassos commented Dec 15, 2025

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

  • Add export_merge_tree_part_max_bytes_per_file and export_merge_tree_part_max_rows_per_file to be able to split large parquet files.
  • Preserve the entire settings object in the manifest as opposed to only format settings
  • Fix thread safety issues in system.exports table
  • Collapse the two try catch blocks in ExportsList into a single one. Two reasons for that: 1. Since file names are now generated on the fly, we might throw file already exists during streaming, and we need to catch that exception to handle file already exists policy. 2. less code
  • Preserve max_bytes and max_rows settings in export partition manifest
  • Update the zk export partition manifest to accomodate multiple file paths for a given part

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • Tiered Storage (2h)

@github-actions
Copy link

github-actions bot commented Dec 15, 2025

Workflow [PR], commit [8a3179e]

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@arthurpassos arthurpassos added antalya port-antalya PRs to be ported to all new Antalya releases antalya-25.8 labels Dec 15, 2025
@arthurpassos
Copy link
Collaborator Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +62 to +65
const auto path_without_extension = base_path.substr(0, pos);
const auto file_format_extension = "." + file_format;

return path_without_extension + index_string + file_format_extension;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Default exports renamed and bypass existing files

The new multi-file sink always appends a numeric suffix to the target path (file_paths.size() + 1 before the extension), so even when no byte/row limits are configured (defaults are 0) a single-part export now writes to <base>.1.parquet instead of the original <base>.parquet. Because StorageObjectStorage::import now unconditionally uses this sink, rerunning an export with the default skip policy after an earlier run (which created <base>.parquet) will not detect the existing file and will generate a second copy under the new name. That breaks the idempotent behavior of exports and results in duplicate data after an upgrade.

Useful? React with 👍 / 👎.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true, but also ok. Users should be aware of this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkmkme fyi

if (manifest.completion_callback)
manifest.completion_callback(MergeTreePartExportManifest::CompletionCallbackResult::createFailure(e));

throw;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly don't remember why this was here, maybe I did not need it at all?

@arthurpassos
Copy link
Collaborator Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 198 to 199
if (manifest.completion_callback)
manifest.completion_callback(MergeTreePartExportManifest::CompletionCallbackResult::createSuccess((*exports_list_entry)->destination_file_path));
manifest.completion_callback(MergeTreePartExportManifest::CompletionCallbackResult::createSuccess((*exports_list_entry)->destination_file_paths.front()));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Completion callback drops split export file paths

When a part is split into multiple files via export_merge_tree_part_max_bytes_per_file/_max_rows_per_file, only the first generated path is reported in the completion callback (destination_file_paths.front()). The partition scheduler stores that single path in ZooKeeper and ExportPartitionUtils::getExportedPaths uses it to build the commit file, so any additional files produced for the same part are omitted from the committed path list. In replicated partition exports this leaves the final commit file incomplete and downstream consumers that rely on it will miss most of the exported data.

Useful? React with 👍 / 👎.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I could ignore export partition for this feature, but I suppose I will have to do something about it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the code so it the partition exports stores an array of destination file paths for each part, but I am not preserving the query settings in zookeeper - meaning the two settings introduced in this PR can't be used with export partition, only with part export.

Later on I will fix it, don't have time now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, implemented it as well, just don't have the time to write a test

@arthurpassos
Copy link
Collaborator Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

const auto exported_paths = ExportPartitionUtils::getExportedPaths(log, zk, entry_path);
if (exported_paths.size() != manifest.parts.size())
{
LOG_INFO(log, "ExportPartition: Skipping {}: exported paths size does not match parts size, this is a BUG", entry_path);

P1 Badge Fix exported path count check for multi-file parts

The new multi-file export path is never committed when a part is split across several files (e.g., using export_merge_tree_part_max_bytes_per_file/_max_rows_per_file). getExportedPaths now returns every file written for each processed part, but here the code compares that flattened file count to manifest.parts.size() (number of parts) and bails if they differ. As soon as any part writes more than one file, this check always fails and the scheduler never marks the export as completed or writes the commit file. The guard needs to validate per-part entries rather than total file count so multi-file exports can finish.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

{
const auto exported_paths = ExportPartitionUtils::getExportedPaths(log, zk, entry_path);

if (exported_paths.size() != manifest.parts.size())
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We no longer have the 1:1 merge tree part to parquet file relation, can't check this anymore. We also don't know the number of files beforehand. Not strictly necessary to have this tho.

@arthurpassos
Copy link
Collaborator Author

Current CICD failure on 03572_export_merge_tree_part_to_object_storage

DB::Exception: Creating columns of type LowCardinality(Int64) is prohibited by default due to expected negative impact on performance. It can be enabled with the `allow_suspicious_low_cardinality_types` setting. (SUSPICIOUS_TYPE_FOR_LOW_CARDINALITY) (version 25.8.12.20683.altinityantalya) (from [::ffff:127.0.0.1]:54270) (comment: 03572_export_merge_tree_part_to_object_storage.sh) (query 1, line 1) (in query: SELECT count(__table1._file) AS `count(_file)` FROM s3Cluster('default_parallel_replicas', s3_conn, filename = 'big_destination_max_bytes_26172/**', format = 'One', structure = '`dummy` UInt8, `year` LowCardinality(Int64)') AS __table1), Stack trace (when copying this message, always include the lines below):
2025-12-16 19:05:35 

should be fixed by this missing backport: #1231

@arthurpassos
Copy link
Collaborator Author

@codex review

@chatgpt-codex-connector
Copy link

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@mkmkme
Copy link
Collaborator

mkmkme commented Dec 16, 2025

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 75 to 79
const auto exported_paths = ExportPartitionUtils::getExportedPaths(log, zk, entry_path);

if (exported_paths.size() != manifest.parts.size())
{
LOG_INFO(log, "ExportPartition: Skipping {}: exported paths size does not match parts size, this is a BUG", entry_path);
return;
}

LOG_INFO(log, "ExportPartition: Exported paths size matches parts size, commit the export");
destination_storage->commitExportPartitionTransaction(manifest.transaction_id, manifest.partition_id, exported_paths, context);

LOG_INFO(log, "ExportPartition: Committed export, mark as completed");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Commit partition export without verifying processed parts

After splitting support was added, commit() now calls commitExportPartitionTransaction and marks the export completed without any validation of exported_paths. This function is invoked from ExportPartitionManifestUpdatingTask::tryCleanup when the processing directory is empty; if ZooKeeper state is stale or parts never actually exported, exported_paths can be empty or incomplete, yet the code will still write a commit file and mark the export as completed, silently losing parts. There should still be a check that the number of processed parts matches the manifest before committing.

Useful? React with 👍 / 👎.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

@mkmkme
Copy link
Collaborator

mkmkme commented Dec 17, 2025

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +61 to +65
const auto path_without_extension = base_path.substr(0, pos);
const auto file_format_extension = "." + file_format;

return path_without_extension + index_string + file_format_extension;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve original export file name when split limits unset

Exports now always rewrite the destination path by appending a numeric suffix (generateNewFilePath adds .1 even before any size/row limit is exceeded). With the default export_merge_tree_part_max_bytes_per_file=0 and export_merge_tree_part_max_rows_per_file=0, a single‑file export that used to write …/part_checksum.parquet will now write …/part_checksum.1.parquet. That silently breaks idempotency and the export_merge_tree_part_file_already_exists_policy because existing files at the original path are no longer detected (a repeat export will happily create a new suffixed file instead of skipping/overwriting).

Useful? React with 👍 / 👎.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've already mentioned that man... we have been through it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a chance it did not consider the context of the review before that or created the new context because this time I summoned it.

Comment on lines 168 to +172
manifest.max_threads = json->getValue<size_t>("max_threads");
manifest.parallel_formatting = json->getValue<bool>("parallel_formatting");
manifest.parquet_parallel_encoding = json->getValue<bool>("parquet_parallel_encoding");

manifest.max_bytes_per_file = json->getValue<size_t>("max_bytes_per_file");
manifest.max_rows_per_file = json->getValue<size_t>("max_rows_per_file");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Partition export manifests from earlier replicas fail to parse

ExportReplicatedMergeTreePartitionManifest::fromJsonString now unconditionally reads max_bytes_per_file and max_rows_per_file via getValue, which throws if the fields are absent. ZooKeeper entries created before this change don’t contain those keys, so after upgrading a node any pending export_partition task will raise during manifest parsing and never be scheduled/committed. The parser needs a default when the new fields are missing to remain backward compatible with existing manifests.

Useful? React with 👍 / 👎.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a problem. Nobody is using the export partition feature for now, and even if they were, the zookeeper entries would have expired (unless a big TTL was set, which I doubt)

mkmkme
mkmkme previously approved these changes Dec 18, 2025
Copy link
Collaborator

@mkmkme mkmkme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left a couple of comments, but all of them are optional. Good job!

create_time: 2025-11-19 09:09:11
part_name: 20251016-365_1_1_0
destination_file_path: table_root/eventDate=2025-10-16/retention=365/20251016-365_1_1_0_17B2F6CD5D3C18E787C07AE3DAF16EB1.parquet
destination_file_paths: ['table_root/eventDate=2025-10-16/retention=365/20251016-365_1_1_0_17B2F6CD5D3C18E787C07AE3DAF16EB1.parquet']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood correctly, with this issue being merged, this file name will most likely look like ...EB1.1.parquet. Maybe change it here in the doc to be clearer?

if (exported_paths.size() != manifest.parts.size())
if (exported_paths.empty())
{
LOG_INFO(log, "ExportPartition: No exported paths found, will not commit export. This might be a bug");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one and the one below definitely deserve a LOG_WARNING

{
auto new_path = generateNewFilePath();

/// sounds like bad design, but callers might decide to ignore the exception, and if we throw it before the callback
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: worth adding a TODO here

{
if (pure_storage)
return pure_storage->import(file_name, block_with_partition_values, destination_file_path, overwrite_if_exists, format_settings_, context);
return pure_storage->import(file_name, block_with_partition_values, new_file_path_callback, overwrite_if_exists, max_bytes_per_file, max_rows_per_file, format_settings_, context);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::move for new_file_path_callback here and below?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it is probably better if I take it by const-ref. I don't know why I am passing it by value, probably due to auto complete laziness

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me :)

@mkmkme
Copy link
Collaborator

mkmkme commented Dec 18, 2025

Let's see if Codex takes into account the thumbs down emojis from the previous invocations.

@codex review the PR, take into account previously discarded comments.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +90 to +92
const auto paths_in_destination_array = json->getArray("paths_in_destination");
for (size_t i = 0; i < paths_in_destination_array->size(); ++i)
entry.paths_in_destination.emplace_back(paths_in_destination_array->getElement<String>(static_cast<unsigned int>(i)));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Handle legacy processed-part manifest entries

fromJsonString now assumes every processed entry contains a paths_in_destination array and immediately dereferences it. Existing ZooKeeper nodes written by previous versions stored a single path_in_destination string, so after an upgrade getArray will return null (or throw), crashing ExportPartitionUtils::getExportedPaths when it reads legacy entries and preventing pending exports from ever committing. Please keep parsing the old field or guard against missing arrays before dereferencing.

Useful? React with 👍 / 👎.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sigh we've been through that multiple times

@zvonand zvonand merged commit c24224b into antalya-25.8 Dec 18, 2025
137 checks passed
@Selfeer Selfeer added the verified Verified by QA label Dec 23, 2025
@DimensionWieldr
Copy link
Collaborator

DimensionWieldr commented Dec 23, 2025

Added a regression test for this PR: max_bytes_per_file_and_rows_per_file

It tests both export_merge_tree_part_max_bytes_per_file and export_merge_tree_part_max_rows_per_file settings, checking that the s3 table destinations have all the expected rows after the export completes, as well as the correct number of split parquet files for the large part.

The test passes :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

antalya antalya-25.8 port-antalya PRs to be ported to all new Antalya releases verified Verified by QA

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants