-
Notifications
You must be signed in to change notification settings - Fork 11
split large parquet files on part export, preserve entire settings object in part export #1229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ject in part export
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const auto path_without_extension = base_path.substr(0, pos); | ||
| const auto file_format_extension = "." + file_format; | ||
|
|
||
| return path_without_extension + index_string + file_format_extension; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default exports renamed and bypass existing files
The new multi-file sink always appends a numeric suffix to the target path (file_paths.size() + 1 before the extension), so even when no byte/row limits are configured (defaults are 0) a single-part export now writes to <base>.1.parquet instead of the original <base>.parquet. Because StorageObjectStorage::import now unconditionally uses this sink, rerunning an export with the default skip policy after an earlier run (which created <base>.parquet) will not detect the existing file and will generate a second copy under the new name. That breaks the idempotent behavior of exports and results in duplicate data after an upgrade.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true, but also ok. Users should be aware of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mkmkme fyi
| if (manifest.completion_callback) | ||
| manifest.completion_callback(MergeTreePartExportManifest::CompletionCallbackResult::createFailure(e)); | ||
|
|
||
| throw; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly don't remember why this was here, maybe I did not need it at all?
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (manifest.completion_callback) | ||
| manifest.completion_callback(MergeTreePartExportManifest::CompletionCallbackResult::createSuccess((*exports_list_entry)->destination_file_path)); | ||
| manifest.completion_callback(MergeTreePartExportManifest::CompletionCallbackResult::createSuccess((*exports_list_entry)->destination_file_paths.front())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completion callback drops split export file paths
When a part is split into multiple files via export_merge_tree_part_max_bytes_per_file/_max_rows_per_file, only the first generated path is reported in the completion callback (destination_file_paths.front()). The partition scheduler stores that single path in ZooKeeper and ExportPartitionUtils::getExportedPaths uses it to build the commit file, so any additional files produced for the same part are omitted from the committed path list. In replicated partition exports this leaves the final commit file incomplete and downstream consumers that rely on it will miss most of the exported data.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I could ignore export partition for this feature, but I suppose I will have to do something about it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the code so it the partition exports stores an array of destination file paths for each part, but I am not preserving the query settings in zookeeper - meaning the two settings introduced in this PR can't be used with export partition, only with part export.
Later on I will fix it, don't have time now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, implemented it as well, just don't have the time to write a test
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
ClickHouse/src/Storages/MergeTree/ExportPartitionUtils.cpp
Lines 75 to 79 in b7f6b57
| const auto exported_paths = ExportPartitionUtils::getExportedPaths(log, zk, entry_path); | |
| if (exported_paths.size() != manifest.parts.size()) | |
| { | |
| LOG_INFO(log, "ExportPartition: Skipping {}: exported paths size does not match parts size, this is a BUG", entry_path); |
The new multi-file export path is never committed when a part is split across several files (e.g., using export_merge_tree_part_max_bytes_per_file/_max_rows_per_file). getExportedPaths now returns every file written for each processed part, but here the code compares that flattened file count to manifest.parts.size() (number of parts) and bails if they differ. As soon as any part writes more than one file, this check always fails and the scheduler never marks the export as completed or writes the commit file. The guard needs to validate per-part entries rather than total file count so multi-file exports can finish.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| { | ||
| const auto exported_paths = ExportPartitionUtils::getExportedPaths(log, zk, entry_path); | ||
|
|
||
| if (exported_paths.size() != manifest.parts.size()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We no longer have the 1:1 merge tree part to parquet file relation, can't check this anymore. We also don't know the number of files beforehand. Not strictly necessary to have this tho.
|
Current CICD failure on should be fixed by this missing backport: #1231 |
|
@codex review |
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const auto exported_paths = ExportPartitionUtils::getExportedPaths(log, zk, entry_path); | ||
|
|
||
| if (exported_paths.size() != manifest.parts.size()) | ||
| { | ||
| LOG_INFO(log, "ExportPartition: Skipping {}: exported paths size does not match parts size, this is a BUG", entry_path); | ||
| return; | ||
| } | ||
|
|
||
| LOG_INFO(log, "ExportPartition: Exported paths size matches parts size, commit the export"); | ||
| destination_storage->commitExportPartitionTransaction(manifest.transaction_id, manifest.partition_id, exported_paths, context); | ||
|
|
||
| LOG_INFO(log, "ExportPartition: Committed export, mark as completed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit partition export without verifying processed parts
After splitting support was added, commit() now calls commitExportPartitionTransaction and marks the export completed without any validation of exported_paths. This function is invoked from ExportPartitionManifestUpdatingTask::tryCleanup when the processing directory is empty; if ZooKeeper state is stale or parts never actually exported, exported_paths can be empty or incomplete, yet the code will still write a commit file and mark the export as completed, silently losing parts. There should still be a check that the number of processed parts matches the manifest before committing.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const auto path_without_extension = base_path.substr(0, pos); | ||
| const auto file_format_extension = "." + file_format; | ||
|
|
||
| return path_without_extension + index_string + file_format_extension; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preserve original export file name when split limits unset
Exports now always rewrite the destination path by appending a numeric suffix (generateNewFilePath adds .1 even before any size/row limit is exceeded). With the default export_merge_tree_part_max_bytes_per_file=0 and export_merge_tree_part_max_rows_per_file=0, a single‑file export that used to write …/part_checksum.parquet will now write …/part_checksum.1.parquet. That silently breaks idempotency and the export_merge_tree_part_file_already_exists_policy because existing files at the original path are no longer detected (a repeat export will happily create a new suffixed file instead of skipping/overwriting).
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've already mentioned that man... we have been through it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a chance it did not consider the context of the review before that or created the new context because this time I summoned it.
| manifest.max_threads = json->getValue<size_t>("max_threads"); | ||
| manifest.parallel_formatting = json->getValue<bool>("parallel_formatting"); | ||
| manifest.parquet_parallel_encoding = json->getValue<bool>("parquet_parallel_encoding"); | ||
|
|
||
| manifest.max_bytes_per_file = json->getValue<size_t>("max_bytes_per_file"); | ||
| manifest.max_rows_per_file = json->getValue<size_t>("max_rows_per_file"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partition export manifests from earlier replicas fail to parse
ExportReplicatedMergeTreePartitionManifest::fromJsonString now unconditionally reads max_bytes_per_file and max_rows_per_file via getValue, which throws if the fields are absent. ZooKeeper entries created before this change don’t contain those keys, so after upgrading a node any pending export_partition task will raise during manifest parsing and never be scheduled/committed. The parser needs a default when the new fields are missing to remain backward compatible with existing manifests.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a problem. Nobody is using the export partition feature for now, and even if they were, the zookeeper entries would have expired (unless a big TTL was set, which I doubt)
mkmkme
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Left a couple of comments, but all of them are optional. Good job!
| create_time: 2025-11-19 09:09:11 | ||
| part_name: 20251016-365_1_1_0 | ||
| destination_file_path: table_root/eventDate=2025-10-16/retention=365/20251016-365_1_1_0_17B2F6CD5D3C18E787C07AE3DAF16EB1.parquet | ||
| destination_file_paths: ['table_root/eventDate=2025-10-16/retention=365/20251016-365_1_1_0_17B2F6CD5D3C18E787C07AE3DAF16EB1.parquet'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understood correctly, with this issue being merged, this file name will most likely look like ...EB1.1.parquet. Maybe change it here in the doc to be clearer?
| if (exported_paths.size() != manifest.parts.size()) | ||
| if (exported_paths.empty()) | ||
| { | ||
| LOG_INFO(log, "ExportPartition: No exported paths found, will not commit export. This might be a bug"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one and the one below definitely deserve a LOG_WARNING
| { | ||
| auto new_path = generateNewFilePath(); | ||
|
|
||
| /// sounds like bad design, but callers might decide to ignore the exception, and if we throw it before the callback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: worth adding a TODO here
| { | ||
| if (pure_storage) | ||
| return pure_storage->import(file_name, block_with_partition_values, destination_file_path, overwrite_if_exists, format_settings_, context); | ||
| return pure_storage->import(file_name, block_with_partition_values, new_file_path_callback, overwrite_if_exists, max_bytes_per_file, max_rows_per_file, format_settings_, context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::move for new_file_path_callback here and below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it is probably better if I take it by const-ref. I don't know why I am passing it by value, probably due to auto complete laziness
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me :)
|
Let's see if Codex takes into account the thumbs down emojis from the previous invocations. @codex review the PR, take into account previously discarded comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const auto paths_in_destination_array = json->getArray("paths_in_destination"); | ||
| for (size_t i = 0; i < paths_in_destination_array->size(); ++i) | ||
| entry.paths_in_destination.emplace_back(paths_in_destination_array->getElement<String>(static_cast<unsigned int>(i))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle legacy processed-part manifest entries
fromJsonString now assumes every processed entry contains a paths_in_destination array and immediately dereferences it. Existing ZooKeeper nodes written by previous versions stored a single path_in_destination string, so after an upgrade getArray will return null (or throw), crashing ExportPartitionUtils::getExportedPaths when it reads legacy entries and preventing pending exports from ever committing. Please keep parsing the old field or guard against missing arrays before dereferencing.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sigh we've been through that multiple times
|
Added a regression test for this PR: max_bytes_per_file_and_rows_per_file It tests both The test passes :) |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
export_merge_tree_part_max_bytes_per_fileandexport_merge_tree_part_max_rows_per_fileto be able to split large parquet files.Documentation entry for user-facing changes
...
CI/CD Options
Exclude tests:
Regression jobs to run: