Conversation
…s only support single namespace, for every namespace , make call to getTables. Use ThreadPoolCallbackRunnerLocal to perform in thredpool. tryGetTableMetadata , Inject https://s3.{region}.amazonaws.com so that requests dont goto bucket.s3.amazonaws.com. pre-strips the s3://bucket-name/ prefix, leaving just the relative path metadata/00001-....json, so downstream code constructs the correct S3 key. Detects missing/empty credentials and injects the catalog's own IAM credentials from credentials_provider->GetAWSCredentials().
301f82e to
dbcf541
Compare
|
TESTING |
|
AI audit note: This review comment was generated by AI (gpt-5.3-codex). Audit update for PR #1617 (Added support for s3 tables / S3 Tables Iceberg REST catalog): Confirmed defects: No confirmed defects in reviewed scope. Coverage summary: |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dbcf541b8d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| /// Iceberg REST catalog for Amazon S3 Tables (SigV4, signing name `s3tables`). | ||
| /// https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-tables-integrating-open-source.html | ||
| class S3TablesCatalog final : public RestCatalog |
There was a problem hiding this comment.
Override S3 Tables drop to request purge
S3TablesCatalog inherits RestCatalog::dropTable, which hardcodes ?purgeRequested=False in the DELETE URL; AWS S3 Tables documents that purge=false is rejected with 400 Bad Request, so DROP TABLE against catalog_type='s3tables' will fail consistently. Please override the drop path for this catalog to send a purge-enabled delete request.
Useful? React with 👍 / 👎.
| { | ||
| if (settings[DatabaseDataLakeSetting::catalog_type].value == DB::DatabaseDataLakeCatalogType::GLUE) | ||
| if (settings[DatabaseDataLakeSetting::catalog_type].value == DB::DatabaseDataLakeCatalogType::GLUE | ||
| || settings[DatabaseDataLakeSetting::catalog_type].value == DB::DatabaseDataLakeCatalogType::S3_TABLES) |
There was a problem hiding this comment.
Enforce non-empty warehouse for s3tables
validateSettings() now groups S3_TABLES with Glue and skips the warehouse check, but this catalog still relies on warehouse for loadConfig() (/v1/config?warehouse=...) and for deriving config.prefix when the server does not return one; allowing an empty warehouse defers this to runtime failures on namespace/table requests instead of rejecting invalid configuration at CREATE DATABASE time.
Useful? React with 👍 / 👎.
|
|
||
| LOG_DEBUG(log, "Requesting: {}", url.toString()); | ||
|
|
||
| return DB::BuilderRWBufferFromHTTP(url) |
There was a problem hiding this comment.
I see the base class implementation has some sort of retry mechanism, why not have it here as well?
try
{
return create_buffer(false);
}
catch (const DB::HTTPException & e)
{
const auto status = e.getHTTPStatus();
if (update_token_if_expired &&
(status == Poco::Net::HTTPResponse::HTTPStatus::HTTP_UNAUTHORIZED
|| status == Poco::Net::HTTPResponse::HTTPStatus::HTTP_FORBIDDEN))
{
return create_buffer(true);
}
throw;
}
|
|
||
| const auto & context = getContext(); | ||
|
|
||
| Poco::URI url(endpoint, /* enable_url_encoding */ false); |
There was a problem hiding this comment.
/// enable_url_encoding=false to allow use tables with encoded sequences in names like 'foo%2Fbar'
|
|
||
| } | ||
|
|
||
| void signRequestWithAWSV4( |
There was a problem hiding this comment.
Nit: Perhaps name it like signHeaders as the output of this method is a list of headers and not a request?
|
|
||
| static constexpr bool sign_body = true; | ||
| if (!signer.SignRequest(request, region.c_str(), service.c_str(), sign_body)) | ||
| throw DB::Exception(DB::ErrorCodes::LOGICAL_ERROR, "AWS SigV4 signing failed"); |
There was a problem hiding this comment.
LOGICAL_ERROR will crash ClickHouse, it should be something else
| Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy::Always, | ||
| /* urlEscapePath = */ false); | ||
|
|
||
| config = loadConfig(); |
There was a problem hiding this comment.
This is kind of dangerous. You've made createReadBuffer virtual, and loadConfig calls it. If S3TablesCatalog::createReadBuffer relies on members that are initialized after loadConfig, it is UB
There was a problem hiding this comment.
Looking at the code, there doesn't seem to be an imminent problem, but i would at least add a comment about this?
I don't see a way around this "problem" tbh
There was a problem hiding this comment.
Maybe this solves the problem? https://github.com/Altinity/ClickHouse/pull/1617/changes#r3053334453
| return true; | ||
| } | ||
|
|
||
| DB::HTTPHeaderEntries S3TablesCatalog::getAuthHeaders(bool /* update_token */) const |
There was a problem hiding this comment.
Why not use this method to implement the signing instead of overwriting the createReadBuffer method?
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Support for s3 tables(iceberg REST Catalog), closes #ClickHouse#95340
Documentation entry for user-facing changes
Support for s3 tables(Iceberg serverless REST Catalog) using sigv4 authentication.
Can be enabled with
allow_experimental_database_s3_tablesCI/CD Options
Exclude tests:
Regression jobs to run: