update paimon

luoyuxia · luoyuxia · commit b5ed379cd06d · 2026-02-06T06:56:50.000+08:00
diff --git a/website/docs/maintenance/tiered-storage/lakehouse-storage.md b/website/docs/maintenance/tiered-storage/lakehouse-storage.md
@@ -35,7 +35,7 @@ datalake.paimon.metastore: filesystem
 datalake.paimon.warehouse: /tmp/paimon
 ```
 
-Fluss processes Paimon configurations by removing the `datalake.paimon.` prefix and then use the remaining configuration (without the prefix `datalake.paimon.`) to create the Paimon catalog. Checkout the [Paimon documentation](https://paimon.apache.org/docs/1.3/maintenance/configurations/) for more details on the available configurations.
+Fluss processes Paimon configurations by removing the `datalake.paimon.` prefix and then use the remaining configuration (without the prefix `datalake.paimon.`) to create the Paimon catalog. Checkout the [Paimon documentation](https://paimon.apache.org/docs/$PAIMON_VERSION_SHORT$/maintenance/configurations/) for more details on the available configurations.
 
 For example, if you want to configure to use Hive catalog, you can configure like following:
 ```yaml
@@ -66,7 +66,7 @@ Then, you must start the datalake tiering service to tier Fluss's data to the la
 - Put [fluss-lake-paimon jar](https://repo1.maven.org/maven2/org/apache/fluss/fluss-lake-paimon/$FLUSS_VERSION$/fluss-lake-paimon-$FLUSS_VERSION$.jar) into `${FLINK_HOME}/lib`
 - Put [paimon-bundle jar](https://repo.maven.apache.org/maven2/org/apache/paimon/paimon-bundle/$PAIMON_VERSION$/paimon-bundle-$PAIMON_VERSION$.jar) into `${FLINK_HOME}/lib`
 - [Download](https://flink.apache.org/downloads/) pre-bundled Hadoop jar `flink-shaded-hadoop-2-uber-*.jar` and put into `${FLINK_HOME}/lib`
-- Put Paimon's [filesystem jar](https://paimon.apache.org/docs/1.3/project/download/) into `${FLINK_HOME}/lib`, if you use s3 to store paimon data, please put `paimon-s3` jar into `${FLINK_HOME}/lib`
+- Put Paimon's [filesystem jar](https://paimon.apache.org/docs/$PAIMON_VERSION_SHORT$/project/download/) into `${FLINK_HOME}/lib`, if you use s3 to store paimon data, please put `paimon-s3` jar into `${FLINK_HOME}/lib`
 - The other jars that Paimon may require, for example, if you use HiveCatalog, you will need to put hive related jars
 
 
diff --git a/website/docs/quickstart/lakehouse.md b/website/docs/quickstart/lakehouse.md
@@ -32,12 +32,39 @@ mkdir fluss-quickstart-paimon
 cd fluss-quickstart-paimon
 ```
 
-2. Create a `docker-compose.yml` file with the following content:
+2. Create directories and download required jars:
+
+```shell
+mkdir -p lib opt
+
+# Flink connectors
+wget -O lib/flink-faker-0.5.3.jar https://github.com/knaufk/flink-faker/releases/download/v0.5.3/flink-faker-0.5.3.jar
+wget -O "lib/fluss-flink-1.20-$FLUSS_DOCKER_VERSION$.jar" "https://repo1.maven.org/maven2/org/apache/fluss/fluss-flink-1.20/$FLUSS_DOCKER_VERSION$/fluss-flink-1.20-$FLUSS_DOCKER_VERSION$.jar"
+wget -O "lib/paimon-flink-1.20-$PAIMON_VERSION$.jar" "https://repo1.maven.org/maven2/org/apache/paimon/paimon-flink-1.20/$PAIMON_VERSION$/paimon-flink-1.20-$PAIMON_VERSION$.jar"
+
+# Fluss lake plugin
+wget -O "lib/fluss-lake-paimon-$FLUSS_DOCKER_VERSION$.jar" "https://repo1.maven.org/maven2/org/apache/fluss/fluss-lake-paimon/$FLUSS_DOCKER_VERSION$/fluss-lake-paimon-$FLUSS_DOCKER_VERSION$.jar"
+
+# Paimon bundle jar
+wget -O "lib/paimon-bundle-$PAIMON_VERSION$.jar" "https://repo.maven.apache.org/maven2/org/apache/paimon/paimon-bundle/$PAIMON_VERSION$/paimon-bundle-$PAIMON_VERSION$.jar"
+
+# Hadoop bundle jar
+wget -O lib/flink-shaded-hadoop-2-uber-2.8.3-10.0.jar https://repo.maven.apache.org/maven2/org/apache/flink/flink-shaded-hadoop-2-uber/2.8.3-10.0/flink-shaded-hadoop-2-uber-2.8.3-10.0.jar
+
+# Tiering service
+wget -O "opt/fluss-flink-tiering-$FLUSS_DOCKER_VERSION$.jar" "https://repo1.maven.org/maven2/org/apache/fluss/fluss-flink-tiering/$FLUSS_DOCKER_VERSION$/fluss-flink-tiering-$FLUSS_DOCKER_VERSION$.jar"
+```
 
+:::info
+You can add more jars to this `lib` directory based on your requirements:
+- **Cloud storage support**: For AWS S3 integration with Paimon, add the corresponding [paimon-s3](https://repo.maven.apache.org/maven2/org/apache/paimon/paimon-s3/$PAIMON_VERSION$/paimon-s3-$PAIMON_VERSION$.jar)
+- **Other catalog backends**: Add jars needed for alternative Paimon catalog implementations (e.g., Hive, JDBC)
+  :::
+
+3. Create a `docker-compose.yml` file with the following content:
 
 ```yaml
 services:
-  #begin Fluss cluster
   coordinator-server:
     image: apache/fluss:$FLUSS_DOCKER_VERSION$
     command: coordinatorServer
@@ -54,6 +81,7 @@ services:
         datalake.paimon.warehouse: /tmp/paimon
     volumes:
       - shared-tmpfs:/tmp/paimon
+      - shared-tmpfs:/tmp/fluss
   tablet-server:
     image: apache/fluss:$FLUSS_DOCKER_VERSION$
     command: tabletServer
@@ -72,37 +100,50 @@ services:
         datalake.paimon.warehouse: /tmp/paimon
     volumes:
       - shared-tmpfs:/tmp/paimon
+      - shared-tmpfs:/tmp/fluss
   zookeeper:
     restart: always
     image: zookeeper:3.9.2
-  #end
-  #begin Flink cluster
   jobmanager:
-    image: apache/fluss-quickstart-flink:1.20-$FLUSS_DOCKER_VERSION$
+    image: flink:1.20-scala_2.12-java17
     ports:
       - "8083:8081"
-    command: jobmanager
+    entrypoint: ["/bin/bash", "-c"]
+    command: >
+      "sed -i 's/exec $(drop_privs_cmd)//g' /docker-entrypoint.sh &&
+       cp /tmp/jars/*.jar /opt/flink/lib/ 2>/dev/null || true;
+       cp /tmp/opt/*.jar /opt/flink/opt/ 2>/dev/null || true;
+       /docker-entrypoint.sh jobmanager"
     environment:
       - |
         FLINK_PROPERTIES=
         jobmanager.rpc.address: jobmanager
     volumes:
       - shared-tmpfs:/tmp/paimon
+      - shared-tmpfs:/tmp/fluss
+      - ./lib:/tmp/jars
+      - ./opt:/tmp/opt
   taskmanager:
-    image: apache/fluss-quickstart-flink:1.20-$FLUSS_DOCKER_VERSION$
+    image: flink:1.20-scala_2.12-java17
     depends_on:
       - jobmanager
-    command: taskmanager
+    entrypoint: ["/bin/bash", "-c"]
+    command: >
+      "sed -i 's/exec $(drop_privs_cmd)//g' /docker-entrypoint.sh &&
+       cp /tmp/jars/*.jar /opt/flink/lib/ 2>/dev/null || true;
+       cp /tmp/opt/*.jar /opt/flink/opt/ 2>/dev/null || true;
+       /docker-entrypoint.sh taskmanager"
     environment:
       - |
         FLINK_PROPERTIES=
         jobmanager.rpc.address: jobmanager
         taskmanager.numberOfTaskSlots: 10
         taskmanager.memory.process.size: 2048m
-        taskmanager.memory.framework.off-heap.size: 256m
     volumes:
       - shared-tmpfs:/tmp/paimon
-  #end
+      - shared-tmpfs:/tmp/fluss
+      - ./lib:/tmp/jars
+      - ./opt:/tmp/opt
 
 volumes:
   shared-tmpfs:
@@ -116,11 +157,7 @@ The Docker Compose environment consists of the following containers:
 - **Fluss Cluster:** a Fluss `CoordinatorServer`, a Fluss `TabletServer` and a `ZooKeeper` server.
 - **Flink Cluster**: a Flink `JobManager` and a Flink `TaskManager` container to execute queries.
 
-**Note:** The `apache/fluss-quickstart-flink` image is based on [flink:1.20.3-java17](https://hub.docker.com/layers/library/flink/1.20-java17/images/sha256:296c7c23fa40a9a3547771b08fc65e25f06bc4cfd3549eee243c99890778cafc) and
-includes the [fluss-flink](engine-flink/getting-started.md), [paimon-flink](https://paimon.apache.org/docs/1.3/flink/quick-start/) and
-[flink-connector-faker](https://flink-packages.org/packages/flink-faker) to simplify this guide.
-
-3. To start all containers, run:
+4. To start all containers, run:
 ```shell
 docker compose up -d
 ```
@@ -312,23 +349,69 @@ Congratulations, you are all set!
 
 First, use the following command to enter the Flink SQL CLI Container:
 ```shell
-docker compose exec jobmanager ./sql-client
+docker compose exec jobmanager ./bin/sql-client.sh
 ```
 
-**Note**:
-To simplify this guide, three temporary tables have been pre-created with `faker` connector to generate data.
-You can view their schemas by running the following commands:
+To simplify this guide, we will create three temporary tables with `faker` connector to generate data:
+
+```sql title="Flink SQL"
+CREATE TEMPORARY TABLE source_order (
+    `order_key` BIGINT,
+    `cust_key` INT,
+    `total_price` DECIMAL(15, 2),
+    `order_date` DATE,
+    `order_priority` STRING,
+    `clerk` STRING
+) WITH (
+  'connector' = 'faker',
+  'rows-per-second' = '10',
+  'number-of-rows' = '10000',
+  'fields.order_key.expression' = '#{number.numberBetween ''0'',''100000000''}',
+  'fields.cust_key.expression' = '#{number.numberBetween ''0'',''20''}',
+  'fields.total_price.expression' = '#{number.randomDouble ''3'',''1'',''1000''}',
+  'fields.order_date.expression' = '#{date.past ''100'' ''DAYS''}',
+  'fields.order_priority.expression' = '#{regexify ''(low|medium|high){1}''}',
+  'fields.clerk.expression' = '#{regexify ''(Clerk1|Clerk2|Clerk3|Clerk4){1}''}'
+);
+```
 
 ```sql title="Flink SQL"
-SHOW CREATE TABLE source_customer;
+CREATE TEMPORARY TABLE source_customer (
+    `cust_key` INT,
+    `name` STRING,
+    `phone` STRING,
+    `nation_key` INT NOT NULL,
+    `acctbal` DECIMAL(15, 2),
+    `mktsegment` STRING,
+    PRIMARY KEY (`cust_key`) NOT ENFORCED
+) WITH (
+  'connector' = 'faker',
+  'number-of-rows' = '200',
+  'fields.cust_key.expression' = '#{number.numberBetween ''0'',''20''}',
+  'fields.name.expression' = '#{funnyName.name}',
+  'fields.nation_key.expression' = '#{number.numberBetween ''1'',''5''}',
+  'fields.phone.expression' = '#{phoneNumber.cellPhone}',
+  'fields.acctbal.expression' = '#{number.randomDouble ''3'',''1'',''1000''}',
+  'fields.mktsegment.expression' = '#{regexify ''(AUTOMOBILE|BUILDING|FURNITURE|MACHINERY|HOUSEHOLD){1}''}'
+);
 ```
 
 ```sql title="Flink SQL"
-SHOW CREATE TABLE source_order;
+CREATE TEMPORARY TABLE `source_nation` (
+  `nation_key` INT NOT NULL,
+  `name` STRING,
+   PRIMARY KEY (`nation_key`) NOT ENFORCED
+) WITH (
+  'connector' = 'faker',
+  'number-of-rows' = '100',
+  'fields.nation_key.expression' = '#{number.numberBetween ''1'',''5''}',
+  'fields.name.expression' = '#{regexify ''(CANADA|JORDAN|CHINA|UNITED|INDIA){1}''}'
+);
 ```
 
 ```sql title="Flink SQL"
-SHOW CREATE TABLE source_nation;
+-- drop records silently if a null value would have to be inserted into a NOT NULL column
+SET 'table.exec.sink.not-null-enforcer'='DROP';
 ```
 
   </TabItem>
@@ -635,10 +718,6 @@ CREATE TABLE datalake_enriched_orders (
 ```
 
 Next, perform streaming data writing into the **datalake-enabled** table, `datalake_enriched_orders`:
-```sql  title="Flink SQL"
--- switch to streaming mode
-SET 'execution.runtime-mode' = 'streaming';
-```
 
 ```sql  title="Flink SQL"
 -- insert tuples into datalake_enriched_orders
@@ -674,9 +753,15 @@ The data for the `datalake_enriched_orders` table is stored in Fluss (for real-t
 When querying the `datalake_enriched_orders` table, Fluss uses a union operation that combines data from both Fluss and Paimon to provide a complete result set -- combines **real-time** and **historical** data.
 
 If you wish to query only the data stored in Paimon—offering high-performance access without the overhead of unioning data—you can use the `datalake_enriched_orders$lake` table by appending the `$lake` suffix. 
-This approach also enables all the optimizations and features of a Flink Paimon table source, including [system table](https://paimon.apache.org/docs/1.3/concepts/system-tables/) such as `datalake_enriched_orders$lake$snapshots`.
+This approach also enables all the optimizations and features of a Flink Paimon table source, including [system table](https://paimon.apache.org/docs/$PAIMON_VERSION_SHORT$/concepts/system-tables/) such as `datalake_enriched_orders$lake$snapshots`.
 
 To query the snapshots directly from Paimon, use the following SQL:
+
+```sql  title="Flink SQL"
+-- use tableau result mode
+SET 'sql-client.execution.result-mode' = 'tableau';
+```
+
 ```sql  title="Flink SQL"
 -- switch to batch mode
 SET 'execution.runtime-mode' = 'batch';
@@ -726,33 +811,7 @@ The result looks like:
 ```
 You can execute the real-time analytics query multiple times, and the results will vary with each run as new data is continuously written to Fluss in real-time.
 
-Finally, you can use the following command to view the files stored in Paimon:
-```shell
-docker compose exec taskmanager tree /tmp/paimon/fluss.db
-```
-
-**Sample Output:**
-```shell
-/tmp/paimon/fluss.db
-└── datalake_enriched_orders
-    ├── bucket-0
-    │   ├── changelog-aef1810f-85b2-4eba-8eb8-9b136dec5bdb-0.orc
-    │   └── data-aef1810f-85b2-4eba-8eb8-9b136dec5bdb-1.orc
-    ├── manifest
-    │   ├── manifest-aaa007e1-81a2-40b3-ba1f-9df4528bc402-0
-    │   ├── manifest-aaa007e1-81a2-40b3-ba1f-9df4528bc402-1
-    │   ├── manifest-list-ceb77e1f-7d17-4160-9e1f-f334918c6e0d-0
-    │   ├── manifest-list-ceb77e1f-7d17-4160-9e1f-f334918c6e0d-1
-    │   └── manifest-list-ceb77e1f-7d17-4160-9e1f-f334918c6e0d-2
-    ├── schema
-    │   └── schema-0
-    └── snapshot
-        ├── EARLIEST
-        ├── LATEST
-        └── snapshot-1
-```
-
-The files adhere to Paimon's standard format, enabling seamless querying with other engines such as [Spark](https://paimon.apache.org/docs/1.3/spark/quick-start/) and [Trino](https://paimon.apache.org/docs/1.3/ecosystem/trino/).
+The files adhere to Paimon's standard format, enabling seamless querying with other engines such as [Spark](https://paimon.apache.org/docs/$PAIMON_VERSION_SHORT$/spark/quick-start/) and [Trino](https://paimon.apache.org/docs/$PAIMON_VERSION_SHORT$/ecosystem/trino/).
 
   </TabItem>
 
@@ -776,7 +835,6 @@ SET 'sql-client.execution.result-mode' = 'tableau';
 SET 'execution.runtime-mode' = 'batch';
 ```
 
-
 ```sql  title="Flink SQL"
 -- query snapshots in iceberg
 SELECT snapshot_id, operation FROM datalake_enriched_orders$lake$snapshots;