feat: introduce Result Service using Lakekeeper as REST catalog for Iceberg - catalog migration by mengw15 · Pull Request #4272 · apache/texera

mengw15 · 2026-03-09T11:10:16Z

What changes were proposed in this PR?

This is PR 1 of a decomposed series from #4242, focusing on the core Iceberg catalog migration to support Lakekeeper as a
REST catalog.

Scala changes:

IcebergUtil.scala: added createRestCatalog() for REST catalog connections with S3FileIO (MinIO), and namespace auto-creation for all catalog types
IcebergCatalogInstance.scala: updated singleton to support REST catalog type selection
IcebergTableWriter.scala: updated for REST catalog compatibility
StorageConfig.scala / EnvironmentalVariable.scala: added REST catalog configuration (URI, warehouse name, region, S3
bucket) and environment variable support
storage.conf: added REST catalog config section (default remains postgres for backward compatibility)
build.sbt: added iceberg-aws, AWS SDK dependencies, and Netty version override for Arrow compatibility
PythonWorkflowWorker.scala / ComputingUnitManagingResource.scala: propagate REST catalog config to Python workers and
computing units

Python changes:

iceberg_catalog_instance.py / iceberg_utils.py: added REST catalog support via PyIceberg
storage_config.py: added REST catalog configuration parsing
texera_run_python_worker.py: accept REST catalog config from Scala side
requirements.txt: upgraded PyIceberg (0.8.1 → 0.9.0), added s3fs/aiobotocore for S3 access

Database:

texera_lakekeeper.sql: schema for Lakekeeper's backing database

Note: This PR keeps postgres as the default catalog type in storage.conf. Switching to REST catalog will be enabled
in subsequent deployment PRs.

Any related issues, documentation, discussions?

Part of #4126. Subsequent PRs will cover:

Lakekeeper bootstrap script
Single-node deployment
Kubernetes deployment
CI integration

How was this PR tested?

Manual

Was this PR authored or co-authored using generative AI tooling?

co-authored with Claude

Signed-off-by: Meng Wang <125719918+mengw15@users.noreply.github.com>

bobbai00

Left some comments.

I closed your original PR. Please describe the milestones (your PR's plan) in the issue and update the description of the current PR.

common/config/src/main/resources/storage.conf

bobbai00 · 2026-03-17T22:21:58Z

amber/requirements.txt

 cached_property==1.5.2
 psutil==5.9.0
 tzlocal==2.1
+s3fs==2025.9.0


the latest version is 2026.2.0. Can you try to use the latest version ?

These three libraries have version compatibility constraints, and they also need to stay compatible with boto3. If we try to update them, some other libraries may also need to be updated accordingly.

bobbai00 · 2026-03-17T22:23:18Z

amber/requirements.txt

 tzlocal==2.1
+s3fs==2025.9.0
+aiobotocore==2.25.1
+botocore==1.40.53


Ditto for these two libraries

These three libraries have version compatibility constraints, and they also need to stay compatible with boto3. If we try to update them, some other libraries may also need to be updated accordingly.

common/config/src/main/resources/storage.conf

bobbai00 · 2026-03-17T22:25:27Z

.../src/main/scala/org/apache/texera/amber/core/storage/result/iceberg/IcebergTableWriter.scala

    if (buffer.nonEmpty) {
      // Create a unique file path using the writer's identifier and the filename index
-      val filepath = Paths.get(table.location()).resolve(s"${writerIdentifier}_${filenameIdx}")
+      // Handle S3 URIs (s3://) differently from local file paths to preserve URI format


This logic is very ad-hoc. Can you avoid the if condition of file path's prefix?

Try to have a universal logic for the file path

Simplified to use string concatenation for all URI schemes. Would suggest testing on Windows as well, since we've had path-related issues on Windows before.

common/workflow-core/src/main/scala/org/apache/texera/amber/util/IcebergUtil.scala

bobbai00 · 2026-03-17T22:30:59Z

common/workflow-core/src/main/scala/org/apache/texera/amber/util/IcebergUtil.scala

      TableProperties.COMMIT_MIN_RETRY_WAIT_MS -> StorageConfig.icebergTableCommitMinRetryWaitMs.toString
    )

+    val namespace = Namespace.of(tableNamespace)


The purpose of this check?

This ensures the namespace exists before creating a table. REST catalogs (like Lakekeeper) require the namespace to be explicitly created first, unlike the Postgres JDBC catalog which auto-creates it.

lakekeeper core change, migration

3f08775

github-actions bot added engine dependencies Pull requests that update a dependency file ddl-change Changes to the TexeraDB DDL python service common labels Mar 9, 2026

mengw15 added 6 commits March 9, 2026 16:10

Merge branch 'main' into Lakekeeper-catalog-migration

b4c25c4

Merge branch 'main' into Lakekeeper-catalog-migration

826655b

Merge branch 'main' into Lakekeeper-catalog-migration

9a6cf7c

Merge branch 'main' into Lakekeeper-catalog-migration

7b68c27

Merge branch 'main' into Lakekeeper-catalog-migration

ce933ed

Merge branch 'main' into Lakekeeper-catalog-migration

224a7d7

Signed-off-by: Meng Wang <125719918+mengw15@users.noreply.github.com>

bobbai00 assigned mengw15 Mar 17, 2026

bobbai00 self-requested a review March 17, 2026 22:06

bobbai00 requested changes Mar 17, 2026

View reviewed changes

mengw15 added 7 commits March 17, 2026 15:50

Merge branch 'apache:main' into Lakekeeper-catalog-migration

836ff63

add space for storage.conf

29a5447

resolve comment

0c5a7ef

resolve comment

037374b

resolve comment

deadcd7

resolve comment

09b39bf

resolve comment

35cae38

mengw15 requested a review from bobbai00 March 18, 2026 00:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: introduce Result Service using Lakekeeper as REST catalog for Iceberg - catalog migration #4272

feat: introduce Result Service using Lakekeeper as REST catalog for Iceberg - catalog migration #4272
mengw15 wants to merge 14 commits intoapache:mainfrom
mengw15:Lakekeeper-catalog-migration

mengw15 commented Mar 9, 2026

Uh oh!

bobbai00 left a comment

Uh oh!

Uh oh!

bobbai00 Mar 17, 2026

Uh oh!

mengw15 Mar 17, 2026

Uh oh!

bobbai00 Mar 17, 2026

Uh oh!

mengw15 Mar 17, 2026

Uh oh!

Uh oh!

bobbai00 Mar 17, 2026

Uh oh!

bobbai00 Mar 17, 2026

Uh oh!

mengw15 Mar 18, 2026

Uh oh!

Uh oh!

Uh oh!

bobbai00 Mar 17, 2026

Uh oh!

mengw15 Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mengw15 commented Mar 9, 2026

What changes were proposed in this PR?

Any related issues, documentation, discussions?

How was this PR tested?

Was this PR authored or co-authored using generative AI tooling?

Uh oh!

bobbai00 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants