Skip to content

Conversation

@Zhou-lemon
Copy link
Contributor

@Zhou-lemon Zhou-lemon commented Dec 30, 2025

What problem does this PR solve?

Issue Number: #58199

Problem Summary: This PR implements the rewrite_manifests procedure for Iceberg tables in Apache Doris. The feature allows users to optimize Iceberg table metadata by rewriting manifest files to improve query performance and reduce metadata overhead. This addresses the need for manifest file optimization in large Iceberg tables where numerous small manifest files can impact query planning performance.

ALTER TABLE catalog.test_db.my_table EXECUTE rewrite_manifests();

+---------------------------+----------------------------+
| rewritten_manifests_count | total_data_manifests_count |
+---------------------------+----------------------------+
|                         3 |                          3 |
+---------------------------+----------------------------+

Release note

Feature Implement rewrite_manifests procedure for Iceberg tables

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Dec 30, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Zhou-lemon
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 0.00% (0/94) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 0.00% (0/94) 🎉
Increment coverage report
Complete coverage report

@Zhou-lemon
Copy link
Contributor Author

run performance

@doris-robot
Copy link

TPC-H: Total hot run time: 35722 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit dd2070513296588d1b88b9fd7d0356f9799a6271, data reload: false

------ Round 1 ----------------------------------
q1	17636	4256	4037	4037
q2	2042	367	235	235
q3	10140	1296	736	736
q4	10227	899	328	328
q5	7862	2124	1925	1925
q6	223	170	142	142
q7	968	804	656	656
q8	9306	1452	1158	1158
q9	7014	5136	5198	5136
q10	6847	1812	1405	1405
q11	505	320	303	303
q12	737	752	575	575
q13	17788	3796	3105	3105
q14	293	307	297	297
q15	583	504	510	504
q16	699	693	627	627
q17	716	743	645	645
q18	8097	7733	8132	7733
q19	1223	1042	666	666
q20	417	405	268	268
q21	4505	4199	4234	4199
q22	1175	1074	1042	1042
Total cold run time: 109003 ms
Total hot run time: 35722 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4327	4227	4373	4227
q2	316	426	313	313
q3	2217	2984	2469	2469
q4	1398	1896	1435	1435
q5	4434	4433	4188	4188
q6	225	170	133	133
q7	1988	1909	1766	1766
q8	2627	2455	2375	2375
q9	6966	7220	7064	7064
q10	2455	2681	2147	2147
q11	532	455	454	454
q12	662	694	586	586
q13	3365	3814	3046	3046
q14	278	277	267	267
q15	534	484	488	484
q16	612	659	609	609
q17	1081	1278	1360	1278
q18	7327	7217	7352	7217
q19	878	887	880	880
q20	1872	1962	1787	1787
q21	4571	4321	4147	4147
q22	1092	1047	994	994
Total cold run time: 49757 ms
Total hot run time: 47866 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 175094 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit dd2070513296588d1b88b9fd7d0356f9799a6271, data reload: false

query5	4385	595	448	448
query6	358	255	230	230
query7	4228	467	277	277
query8	348	252	242	242
query9	8757	2650	2648	2648
query10	476	366	326	326
query11	15183	15058	14977	14977
query12	176	115	113	113
query13	1258	500	383	383
query14	6152	3017	2761	2761
query14_1	2674	2630	2645	2630
query15	208	193	176	176
query16	1024	520	478	478
query17	1117	720	604	604
query18	2495	441	350	350
query19	240	229	198	198
query20	132	118	116	116
query21	216	138	119	119
query22	3863	4081	3851	3851
query23	16114	15477	15421	15421
query23_1	15350	15505	15494	15494
query24	7448	1598	1214	1214
query24_1	1223	1226	1243	1226
query25	579	482	441	441
query26	1257	272	164	164
query27	2756	458	301	301
query28	4513	2203	2193	2193
query29	820	572	477	477
query30	322	245	216	216
query31	798	640	564	564
query32	79	71	67	67
query33	555	359	303	303
query34	895	892	547	547
query35	740	838	688	688
query36	874	830	816	816
query37	131	95	81	81
query38	2699	2738	2668	2668
query39	773	754	728	728
query39_1	719	707	733	707
query40	211	129	112	112
query41	64	62	66	62
query42	107	101	104	101
query43	479	443	445	443
query44	1361	764	759	759
query45	184	183	174	174
query46	881	963	626	626
query47	1396	1454	1366	1366
query48	316	325	263	263
query49	589	413	319	319
query50	655	290	205	205
query51	3737	3732	3721	3721
query52	98	108	98	98
query53	298	325	280	280
query54	290	264	248	248
query55	88	74	72	72
query56	285	288	286	286
query57	998	1038	934	934
query58	270	289	252	252
query59	1991	2121	2108	2108
query60	312	312	292	292
query61	167	154	159	154
query62	396	383	308	308
query63	295	264	273	264
query64	4952	1308	1029	1029
query65	3813	3751	3716	3716
query66	1434	434	318	318
query67	14874	14896	15309	14896
query68	8172	998	722	722
query69	495	347	313	313
query70	1067	916	872	872
query71	375	298	276	276
query72	5919	4770	4729	4729
query73	703	615	312	312
query74	8900	8804	8602	8602
query75	2917	2863	2517	2517
query76	3797	1074	689	689
query77	507	361	271	271
query78	9683	9806	9204	9204
query79	1246	957	634	634
query80	647	583	506	506
query81	506	260	230	230
query82	237	146	112	112
query83	267	262	248	248
query84	257	133	103	103
query85	900	512	463	463
query86	413	318	317	317
query87	2808	2863	2741	2741
query88	3289	2284	2289	2284
query89	403	354	344	344
query90	2171	165	155	155
query91	174	172	145	145
query92	82	67	63	63
query93	1207	916	565	565
query94	568	326	302	302
query95	587	384	311	311
query96	596	480	209	209
query97	2334	2406	2326	2326
query98	242	208	194	194
query99	557	602	511	511
Total cold run time: 253766 ms
Total hot run time: 175094 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 27.01 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit dd2070513296588d1b88b9fd7d0356f9799a6271, data reload: false

query1	0.05	0.05	0.05
query2	0.10	0.05	0.05
query3	0.25	0.08	0.08
query4	1.61	0.11	0.12
query5	0.26	0.25	0.25
query6	1.14	0.66	0.66
query7	0.03	0.02	0.02
query8	0.06	0.04	0.04
query9	0.57	0.51	0.52
query10	0.56	0.55	0.56
query11	0.15	0.10	0.11
query12	0.15	0.12	0.14
query13	0.60	0.59	0.61
query14	0.99	0.99	0.97
query15	0.79	0.78	0.80
query16	0.40	0.40	0.39
query17	1.02	1.06	1.05
query18	0.23	0.22	0.21
query19	1.83	1.87	1.87
query20	0.02	0.02	0.01
query21	15.46	0.29	0.14
query22	4.77	0.05	0.04
query23	15.92	0.29	0.10
query24	1.40	0.35	0.49
query25	0.11	0.10	0.06
query26	0.14	0.13	0.14
query27	0.06	0.08	0.05
query28	4.35	1.05	0.88
query29	12.60	3.96	3.15
query30	0.28	0.14	0.13
query31	2.84	0.61	0.37
query32	3.23	0.56	0.46
query33	3.02	3.03	3.11
query34	16.60	5.10	4.43
query35	4.45	4.49	4.43
query36	0.67	0.50	0.50
query37	0.10	0.06	0.07
query38	0.08	0.04	0.03
query39	0.04	0.03	0.03
query40	0.16	0.15	0.14
query41	0.08	0.03	0.03
query42	0.04	0.03	0.03
query43	0.04	0.04	0.03
Total cold run time: 97.25 s
Total hot run time: 27.01 s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants