Skip to content

[feature](cloud) Add table-level event-driven warm up#63832

Open
bobhan1 wants to merge 5 commits into
apache:masterfrom
bobhan1:pick-table-level-warmup
Open

[feature](cloud) Add table-level event-driven warm up#63832
bobhan1 wants to merge 5 commits into
apache:masterfrom
bobhan1:pick-table-level-warmup

Conversation

@bobhan1
Copy link
Copy Markdown
Contributor

@bobhan1 bobhan1 commented May 28, 2026

What problem does this PR solve?

Issue Number: None

Problem Summary:

This PR adds table-level event-driven cloud warm-up support and improves active incremental warm-up progress observability.

Before this change, event-driven warm-up was only controlled at compute-group granularity. Once a load-event warm-up job was enabled for a source and target compute group pair, all source-side table writes could trigger warm-up to the target compute group. That is inefficient for workloads where only selected core tables, high-frequency query tables, or selected async materialized views need to stay warm.

This PR lets users define the warm-up scope with ON TABLES when creating an event-driven load warm-up job. FE persists the normalized table filter in the warm-up job, resolves matched table ids dynamically, sends the table ids to BE, and lets BE filter warm-up rowsets by table id.

User-visible behavior:

  • WARM UP ... ON TABLES supports table-level event-driven warm-up.
  • Table filters support INCLUDE and EXCLUDE rules.
  • Rules support * and ? wildcards, for example db.table, db.*, *.orders_*, and log_db.log_?.
  • INCLUDE defines the candidate warm-up scope, and EXCLUDE removes tables from that included scope.
  • Rules are canonicalized before duplicate checks, so semantically equivalent filters do not create duplicate jobs just because rule order differs.
  • Matching covers both regular OLAP tables and async materialized views.
  • Matched table ids are refreshed as tables or async materialized views are created, dropped, or renamed.
  • The same source compute group can create independent table-level warm-up jobs to different target compute groups with different table filters.
  • SHOW WARM UP JOB exposes the table-level job type, table filter, matched tables, and SyncStats.
  • SHOW WARM UP JOB list output keeps compact SyncStats, while single-job lookup keeps detailed windowed SyncStats.

Example:

WARM UP COMPUTE GROUP query_cg WITH COMPUTE GROUP write_cg
ON TABLES (
    INCLUDE 'core_db.config',
    INCLUDE 'report_db.monthly_*',
    INCLUDE '*.sales_*',
    EXCLUDE '*.*_archive'
)
PROPERTIES (
    "sync_mode" = "event_driven",
    "sync_event" = "load"
);

Conflict and virtual compute group behavior:

  • Table-level load-event warm-up and cluster-level load-event warm-up are mutually exclusive for the same source and target compute group pair.
  • If a conflicting job already exists, creation returns an error that includes the conflicting job id; table-level conflicts also include the table filter.
  • Duplicate checks within the same job type still follow the existing duplicate-check logic.
  • VCG-managed cluster-level load-event warm-up creation does not fail on conflict. Because VCG jobs are created by the MS HTTP API path, FE cancels existing table-level load-event warm-up jobs with the same source and target first, then recreates the VCG-managed cluster-level job.
  • Manually creating a table-level load-event warm-up job is rejected only when both source and target compute groups are owned by the same VCG.
  • SQL still cannot use a virtual compute group directly as the source or target compute group.

Warm-up progress observation:

  • BE records per-job windowed requested, finished, and failed warm-up statistics.
  • BE exposes per-job warm-up statistics through /api/warmup_event_driven_stats.
  • FE aggregates BE statistics and caches the aggregated result in the warm-up job.
  • SyncStats includes source-side and target-side warm-up size/count progress across windows.
  • SyncStats includes trigger-time progress, so users can observe whether the target compute group is behind the latest source-side warm-up trigger.
  • FE /metrics exposes per-job active warm-up metadata, synchronized size, and trigger gap metrics for cloud event-driven warm-up jobs.

Release note

Support table-level event-driven cloud warm-up with ON TABLES filters and per-job warm-up sync statistics.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes. WARM UP supports table-level ON TABLES filters for event-driven load warm-up, and warm-up job output/metrics expose table filter, matched tables, SyncStats, and trigger-gap information.
  • Does this need documentation?

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@bobhan1 bobhan1 requested a review from gavinchou as a code owner May 28, 2026 09:28
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@bobhan1 bobhan1 force-pushed the pick-table-level-warmup branch from 65920e0 to b67c9f7 Compare May 28, 2026 10:37
@bobhan1
Copy link
Copy Markdown
Contributor Author

bobhan1 commented May 28, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31875 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b67c9f73c11b7a8e7fa7f2ba1eab5feb84fbd9ed, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17737	4084	4152	4084
q2	q3	10995	1468	826	826
q4	4827	487	349	349
q5	10674	2321	2093	2093
q6	389	189	140	140
q7	983	789	655	655
q8	9591	1796	1599	1599
q9	7101	5033	5078	5033
q10	6505	2244	1897	1897
q11	438	289	251	251
q12	655	436	313	313
q13	18214	3481	2843	2843
q14	270	263	244	244
q15	q16	826	780	714	714
q17	1005	888	1001	888
q18	7050	5670	6279	5670
q19	1246	1303	1071	1071
q20	515	427	278	278
q21	5995	2809	2622	2622
q22	460	370	305	305
Total cold run time: 105476 ms
Total hot run time: 31875 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5051	4825	4795	4795
q2	q3	5000	5258	4703	4703
q4	2193	2254	1423	1423
q5	4804	4653	4842	4653
q6	232	187	132	132
q7	1909	1643	1408	1408
q8	2243	1983	1963	1963
q9	7483	7489	7484	7484
q10	4779	4702	4225	4225
q11	540	383	353	353
q12	733	747	538	538
q13	3027	3347	2835	2835
q14	282	287	259	259
q15	q16	696	700	613	613
q17	1302	1265	1255	1255
q18	7405	7002	6860	6860
q19	1121	1130	1134	1130
q20	2246	2252	1949	1949
q21	5354	4648	4514	4514
q22	541	471	435	435
Total cold run time: 56941 ms
Total hot run time: 51527 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 172324 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b67c9f73c11b7a8e7fa7f2ba1eab5feb84fbd9ed, data reload: false

query5	4306	703	524	524
query6	337	230	210	210
query7	4230	600	307	307
query8	320	238	224	224
query9	8814	4119	4132	4119
query10	459	349	310	310
query11	5793	2413	2228	2228
query12	181	137	126	126
query13	1283	590	459	459
query14	6132	5521	5179	5179
query14_1	4562	4546	4570	4546
query15	241	209	188	188
query16	1008	450	349	349
query17	1133	722	588	588
query18	2535	495	346	346
query19	210	199	159	159
query20	138	129	132	129
query21	217	143	115	115
query22	13648	13522	13336	13336
query23	17276	16547	16272	16272
query23_1	16401	16410	16421	16410
query24	7384	1802	1343	1343
query24_1	1345	1352	1338	1338
query25	554	487	422	422
query26	1300	325	174	174
query27	2743	546	338	338
query28	4465	2005	1988	1988
query29	1002	623	511	511
query30	314	245	201	201
query31	1140	1095	944	944
query32	88	76	74	74
query33	542	349	286	286
query34	1184	1181	666	666
query35	787	813	702	702
query36	1409	1413	1235	1235
query37	158	109	97	97
query38	3210	3183	3076	3076
query39	951	935	922	922
query39_1	883	910	886	886
query40	232	154	131	131
query41	72	69	69	69
query42	119	110	113	110
query43	345	346	305	305
query44	
query45	218	213	199	199
query46	1107	1245	772	772
query47	2385	2412	2262	2262
query48	399	422	320	320
query49	644	511	396	396
query50	1031	361	257	257
query51	4368	4296	4307	4296
query52	110	109	97	97
query53	271	287	211	211
query54	334	294	286	286
query55	98	93	86	86
query56	323	350	323	323
query57	1426	1411	1302	1302
query58	303	284	268	268
query59	1624	1709	1472	1472
query60	339	345	321	321
query61	179	176	181	176
query62	700	665	604	604
query63	252	213	213	213
query64	2482	868	695	695
query65	
query66	1705	479	362	362
query67	30155	29757	29615	29615
query68	
query69	474	342	307	307
query70	1033	1034	976	976
query71	318	273	267	267
query72	2981	2690	2346	2346
query73	891	816	427	427
query74	5122	4968	4821	4821
query75	2702	2620	2271	2271
query76	2298	1200	808	808
query77	418	419	345	345
query78	12459	12560	11900	11900
query79	1470	1068	740	740
query80	1178	548	461	461
query81	506	284	238	238
query82	1346	163	125	125
query83	350	283	252	252
query84	260	148	149	148
query85	951	539	474	474
query86	457	329	318	318
query87	3457	3395	3258	3258
query88	3597	2746	2720	2720
query89	465	385	346	346
query90	1811	193	176	176
query91	189	180	140	140
query92	82	81	72	72
query93	1521	1519	870	870
query94	641	358	339	339
query95	693	495	351	351
query96	1078	791	363	363
query97	2737	2762	2594	2594
query98	242	231	247	231
query99	1192	1173	1017	1017
Total cold run time: 255869 ms
Total hot run time: 172324 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 4.47% (40/894) 🎉
Increment coverage report
Complete coverage report

bobhan1 added a commit to bobhan1/doris that referenced this pull request May 29, 2026
### What problem does this PR solve?

Issue Number: None

Related PR: apache#63832

Problem Summary: The table-level warm-up change adds a table_id argument before sync_wait_timeout_ms in CloudWarmUpManager::warm_up_rowset. After rebasing onto the latest master, the existing CloudWarmUpManagerTest calls still used the old two-argument form, so the positive-timeout test passed 1000 as table_id and left sync_wait_timeout_ms at its default -1. That made the test take the async non-positive-timeout branch, so the before-wait sync point was never reached and the spurious notify assertion failed. Update the test calls to pass table_id and sync_wait_timeout_ms explicitly.

### Release note

None

### Check List (For Author)

- Test:
    - Unit Test: ./run-be-ut.sh --run --filter=CloudWarmUpManagerTest.* -j100
- Behavior changed: No.
- Does this need documentation: No.
@bobhan1
Copy link
Copy Markdown
Contributor Author

bobhan1 commented May 29, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31958 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ebec831e0da33fb6cc3c0b2899c553d7d928afde, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17731	4070	4024	4024
q2	q3	10846	1417	785	785
q4	4697	483	346	346
q5	7631	2285	2157	2157
q6	234	174	139	139
q7	983	797	648	648
q8	9354	1749	1810	1749
q9	5146	4985	4980	4980
q10	6409	2229	1883	1883
q11	431	266	242	242
q12	633	429	295	295
q13	18077	3440	2783	2783
q14	272	262	248	248
q15	q16	825	781	711	711
q17	1017	941	1058	941
q18	6954	5813	5632	5632
q19	1174	1139	1078	1078
q20	571	432	320	320
q21	5857	2901	2690	2690
q22	591	364	307	307
Total cold run time: 99433 ms
Total hot run time: 31958 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4755	4892	4706	4706
q2	q3	4999	5319	4662	4662
q4	2113	2209	1396	1396
q5	5006	4794	4799	4794
q6	229	178	132	132
q7	1885	1782	1559	1559
q8	2424	2191	2116	2116
q9	7958	7481	7409	7409
q10	4750	4705	4252	4252
q11	533	382	355	355
q12	722	738	515	515
q13	3026	3328	2849	2849
q14	270	279	249	249
q15	q16	668	698	616	616
q17	1286	1256	1255	1255
q18	7289	6851	6944	6851
q19	1110	1095	1097	1095
q20	2251	2298	1942	1942
q21	5293	4548	4462	4462
q22	537	449	398	398
Total cold run time: 57104 ms
Total hot run time: 51613 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 172417 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ebec831e0da33fb6cc3c0b2899c553d7d928afde, data reload: false

query5	4320	672	517	517
query6	345	229	210	210
query7	4257	564	338	338
query8	328	244	225	225
query9	8808	4018	4055	4018
query10	446	361	302	302
query11	5789	2595	2239	2239
query12	180	131	126	126
query13	1291	623	467	467
query14	6094	5485	5181	5181
query14_1	4492	4504	4453	4453
query15	215	208	187	187
query16	997	503	463	463
query17	1155	765	618	618
query18	2538	513	376	376
query19	238	234	174	174
query20	141	136	134	134
query21	226	142	124	124
query22	13630	13634	13328	13328
query23	17399	16582	16309	16309
query23_1	16336	16311	16482	16311
query24	7418	1769	1304	1304
query24_1	1304	1320	1336	1320
query25	546	466	421	421
query26	1320	322	175	175
query27	2695	543	346	346
query28	4413	2001	2006	2001
query29	978	628	502	502
query30	311	231	203	203
query31	1116	1081	960	960
query32	96	75	77	75
query33	574	355	292	292
query34	1224	1133	648	648
query35	763	796	731	731
query36	1440	1448	1272	1272
query37	150	103	88	88
query38	3220	3162	3107	3107
query39	931	923	893	893
query39_1	886	870	886	870
query40	223	149	128	128
query41	66	63	62	62
query42	109	110	109	109
query43	329	332	290	290
query44	
query45	209	204	201	201
query46	1087	1204	723	723
query47	2406	2414	2315	2315
query48	404	407	299	299
query49	631	510	413	413
query50	940	351	249	249
query51	4395	4308	4309	4308
query52	105	105	95	95
query53	254	280	202	202
query54	327	275	259	259
query55	93	100	85	85
query56	303	316	304	304
query57	1443	1448	1360	1360
query58	296	271	272	271
query59	1585	1657	1441	1441
query60	322	336	315	315
query61	164	155	161	155
query62	697	654	592	592
query63	244	206	209	206
query64	2416	821	629	629
query65	
query66	1744	469	356	356
query67	29847	29813	29548	29548
query68	
query69	460	351	311	311
query70	1005	980	974	974
query71	304	279	266	266
query72	3058	2712	2444	2444
query73	898	764	422	422
query74	5102	4978	4818	4818
query75	2705	2632	2237	2237
query76	2266	1141	762	762
query77	406	416	341	341
query78	12547	12473	11857	11857
query79	1481	999	783	783
query80	636	544	464	464
query81	454	275	244	244
query82	1369	158	126	126
query83	356	280	252	252
query84	262	146	141	141
query85	957	546	462	462
query86	408	349	352	349
query87	3439	3373	3226	3226
query88	3626	2716	2706	2706
query89	441	389	349	349
query90	2059	192	179	179
query91	185	174	142	142
query92	78	79	78	78
query93	1527	1479	862	862
query94	559	376	306	306
query95	694	471	348	348
query96	1051	814	347	347
query97	2767	2762	2655	2655
query98	235	235	242	235
query99	1211	1167	1044	1044
Total cold run time: 254755 ms
Total hot run time: 172417 ms

bobhan1 added a commit to bobhan1/doris that referenced this pull request May 29, 2026
### What problem does this PR solve?

Issue Number: None

Related PR: apache#63832

Problem Summary: The table-level warm-up table filter performance tests used tight wall-clock thresholds for the 200K and 500K wildcard match-all cases. CI machines can run these scale tests slightly slower than local runs even though the matching implementation remains efficient. Relax the 200K threshold from 1s to 1.5s and the 500K threshold from 2s to 3s while keeping the existing functional assertions and smaller or more selective performance checks.

### Release note

None

### Check List (For Author)

- Test:
    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.cloud.CacheHotspotManagerTableFilterTest
- Behavior changed: No.
- Does this need documentation: No.
@bobhan1
Copy link
Copy Markdown
Contributor Author

bobhan1 commented May 29, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 78.06% (1854/2375)
Line Coverage 64.54% (33342/51663)
Region Coverage 65.24% (16533/25343)
Branch Coverage 55.77% (8841/15854)

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 77.03% (664/862) 🎉
Increment coverage report
Complete coverage report

bobhan1 added a commit to bobhan1/doris that referenced this pull request May 29, 2026
### What problem does this PR solve?

Issue Number: None

Related PR: apache#63832

Problem Summary: The table-level warm-up table filter performance test for 200K tables with 15 include/exclude rules still used a tight 2s wall-clock threshold. CI can exceed that threshold under load while the matcher remains functionally correct. Relax the threshold to 3s and keep the matched-table assertion unchanged.

### Release note

None

### Check List (For Author)

- Test:
    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.cloud.CacheHotspotManagerTableFilterTest
- Behavior changed: No.
- Does this need documentation: No.
@bobhan1
Copy link
Copy Markdown
Contributor Author

bobhan1 commented May 29, 2026

run buildall

gavinchou
gavinchou previously approved these changes May 29, 2026
bobhan1 added 5 commits May 29, 2026 14:42
Issue Number: None

Related PR: None

Problem Summary: Add table-level event-driven warm-up support for cloud warm-up jobs. The change extends WARM UP ... ON TABLES parsing and validation, persists normalized include and exclude table filters, resolves matching table ids dynamically, prevents conflicting cluster-level and table-level load-event jobs, propagates table ids through BE warm-up requests, records per-job source and target warm-up progress metrics, and exposes compact and detailed SyncStats through SHOW WARM UP JOB and FE metrics. Virtual compute group rebuilds cancel existing table-level load-event jobs before recreating managed cluster-level jobs.

Support table-level event-driven cloud warm-up with ON TABLES filters and warm-up sync statistics.

- Test:
    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.cloud.OnTablesFilterTest,org.apache.doris.cloud.CloudWarmUpJobTableFilterTest,org.apache.doris.cloud.CacheHotspotManagerTableFilterTest,org.apache.doris.cloud.WarmUpStatsTest,org.apache.doris.cloud.WarmUpClusterOnTablesParseTest,org.apache.doris.cloud.catalog.CloudInstanceStatusCheckerTest,org.apache.doris.metric.MetricsTest#testCloudWarmUpSyncJobMetricsReadStatsDirectlyFromJob+testEventDrivenCloudWarmUpSyncJobTriggerGapMetric
    - Unit Test: ./run-be-ut.sh --run --filter=CloudWarmUpManagerFilterTest.*:MBvarWindowedAdderTest.* -j100
    - Manual test: build-support/check-format.sh
    - Manual test: ./build.sh --be --fe --cloud -j100
    - Manual test: docker build -f docker/runtime/doris-compose/Dockerfile -t bh-cluster-2 .
    - Manual test: ./run-regression-test.sh --clean --compile
    - Regression test: env -u HTTP_PROXY -u HTTPS_PROXY -u http_proxy -u https_proxy -u ALL_PROXY -u all_proxy ./run-regression-test.sh --run -d regression-test/suites/cloud_p0/cache/multi_cluster/warm_up/on_tables -runMode=cloud -image bh-cluster-2 -dockerSuiteParallel 1 (18/19 passed; test_warm_up_event_on_tables_overlap_and_mv failed due test SQL duplicate MV column name before the test was fixed)
    - Regression test: env -u HTTP_PROXY -u HTTPS_PROXY -u http_proxy -u https_proxy -u ALL_PROXY -u all_proxy ./run-regression-test.sh --run -d regression-test/suites/cloud_p0/cache/multi_cluster/warm_up/on_tables -s test_warm_up_event_on_tables_overlap_and_mv -runMode=cloud -image bh-cluster-2 -dockerSuiteParallel 1
- Behavior changed: Yes. WARM UP supports ON TABLES filters for event-driven load warm-up and SHOW WARM UP JOB exposes table filter, matched tables, and sync stats.
- Does this need documentation: Yes. Documentation for the new ON TABLES syntax and metrics should be added separately.
### What problem does this PR solve?

Issue Number: None

Related PR: apache#63832

Problem Summary: The table-level warm-up change adds a table_id argument before sync_wait_timeout_ms in CloudWarmUpManager::warm_up_rowset. After rebasing onto the latest master, the existing CloudWarmUpManagerTest calls still used the old two-argument form, so the positive-timeout test passed 1000 as table_id and left sync_wait_timeout_ms at its default -1. That made the test take the async non-positive-timeout branch, so the before-wait sync point was never reached and the spurious notify assertion failed. Update the test calls to pass table_id and sync_wait_timeout_ms explicitly.

### Release note

None

### Check List (For Author)

- Test:
    - Unit Test: ./run-be-ut.sh --run --filter=CloudWarmUpManagerTest.* -j100
- Behavior changed: No.
- Does this need documentation: No.
### What problem does this PR solve?

Issue Number: None

Related PR: apache#63832

Problem Summary: The table-level warm-up table filter performance tests used tight wall-clock thresholds for the 200K and 500K wildcard match-all cases. CI machines can run these scale tests slightly slower than local runs even though the matching implementation remains efficient. Relax the 200K threshold from 1s to 1.5s and the 500K threshold from 2s to 3s while keeping the existing functional assertions and smaller or more selective performance checks.

### Release note

None

### Check List (For Author)

- Test:
    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.cloud.CacheHotspotManagerTableFilterTest
- Behavior changed: No.
- Does this need documentation: No.
### What problem does this PR solve?

Issue Number: None

Related PR: apache#63832

Problem Summary: The table-level warm-up table filter performance test for 200K tables with 15 include/exclude rules still used a tight 2s wall-clock threshold. CI can exceed that threshold under load while the matcher remains functionally correct. Relax the threshold to 3s and keep the matched-table assertion unchanged.

### Release note

None

### Check List (For Author)

- Test:
    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.cloud.CacheHotspotManagerTableFilterTest
- Behavior changed: No.
- Does this need documentation: No.
static constexpr int WINDOW_30M = 1800;
static constexpr int WINDOW_1H = 3600;

MBvarWindowedAdder g_warmup_ed_finish_segment_num("warmup_ed_finish_segment_num", {"job_id"},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any memory issues if there are many jobs.
how does bvar implement "windows", does it recored every smaples of the adder every second?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does "ed" mean?

failure_msg.append(failures[i].reason);
}

return Status::Error(code,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log table_id too

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31398 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 6c32a2f1e75c81c0cea00fbeaee02321c690459b, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17659	4016	3981	3981
q2	q3	10914	1484	831	831
q4	4766	487	353	353
q5	10288	2293	2166	2166
q6	390	177	137	137
q7	996	781	653	653
q8	9660	1690	1626	1626
q9	7003	4990	5022	4990
q10	6448	2248	1905	1905
q11	449	276	252	252
q12	643	431	299	299
q13	18119	3522	2807	2807
q14	271	262	247	247
q15	q16	831	795	712	712
q17	970	860	980	860
q18	6986	5859	5582	5582
q19	1191	1264	1089	1089
q20	525	407	258	258
q21	5537	2635	2337	2337
q22	439	356	313	313
Total cold run time: 104085 ms
Total hot run time: 31398 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4395	4306	4328	4306
q2	q3	4561	4958	4340	4340
q4	2080	2198	1396	1396
q5	4437	4537	5284	4537
q6	256	197	145	145
q7	1954	1833	1710	1710
q8	2539	2273	2349	2273
q9	8152	8054	7878	7878
q10	4808	4785	4302	4302
q11	589	436	383	383
q12	732	756	574	574
q13	3254	3676	2927	2927
q14	295	300	273	273
q15	q16	693	733	662	662
q17	1443	1377	1325	1325
q18	7839	7326	7402	7326
q19	1124	1119	1128	1119
q20	2231	2228	1939	1939
q21	5349	4605	4401	4401
q22	522	454	417	417
Total cold run time: 57253 ms
Total hot run time: 52233 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31974 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a67fe9761c9259e6df78005ce6f432eda5dfba74, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17668	4062	3998	3998
q2	q3	10656	1448	860	860
q4	4696	475	349	349
q5	7708	2333	2132	2132
q6	253	177	140	140
q7	950	791	650	650
q8	9452	1724	1770	1724
q9	5479	4986	4946	4946
q10	6447	2207	1884	1884
q11	452	279	251	251
q12	676	430	295	295
q13	18166	3383	2742	2742
q14	272	257	242	242
q15	q16	848	778	721	721
q17	935	995	918	918
q18	7076	5696	5615	5615
q19	1176	1280	1205	1205
q20	568	437	293	293
q21	5890	2889	2691	2691
q22	449	365	318	318
Total cold run time: 99817 ms
Total hot run time: 31974 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4806	4745	4745	4745
q2	q3	4986	5333	4604	4604
q4	2183	2208	1380	1380
q5	4910	4658	4671	4658
q6	243	178	128	128
q7	1896	1716	1543	1543
q8	2439	2131	2126	2126
q9	7712	7325	7369	7325
q10	4759	4698	4235	4235
q11	524	388	355	355
q12	728	749	526	526
q13	2983	3411	2839	2839
q14	271	274	251	251
q15	q16	677	695	607	607
q17	1298	1257	1253	1253
q18	7295	6965	7102	6965
q19	1142	1075	1094	1075
q20	2210	2213	1937	1937
q21	5276	4555	4365	4365
q22	514	472	409	409
Total cold run time: 56852 ms
Total hot run time: 51326 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 172895 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 6c32a2f1e75c81c0cea00fbeaee02321c690459b, data reload: false

query5	4323	665	548	548
query6	344	223	200	200
query7	4225	567	316	316
query8	323	237	233	233
query9	8821	4154	4074	4074
query10	445	346	307	307
query11	5834	2461	2267	2267
query12	187	132	125	125
query13	1286	598	437	437
query14	6079	5489	5197	5197
query14_1	4501	4514	4473	4473
query15	217	207	192	192
query16	1050	491	476	476
query17	1171	741	631	631
query18	2610	513	367	367
query19	216	223	171	171
query20	145	137	132	132
query21	223	142	130	130
query22	13717	13654	13443	13443
query23	17323	16700	16238	16238
query23_1	16517	16416	16307	16307
query24	7530	1776	1331	1331
query24_1	1314	1346	1340	1340
query25	569	484	425	425
query26	1314	326	180	180
query27	2704	523	337	337
query28	4422	2033	2020	2020
query29	962	607	504	504
query30	309	236	204	204
query31	1126	1087	964	964
query32	87	80	74	74
query33	532	355	299	299
query34	1176	1144	653	653
query35	776	813	677	677
query36	1394	1398	1257	1257
query37	152	102	88	88
query38	3247	3199	3076	3076
query39	938	920	935	920
query39_1	897	870	892	870
query40	228	146	125	125
query41	71	65	62	62
query42	111	113	110	110
query43	335	344	308	308
query44	
query45	213	209	200	200
query46	1092	1178	756	756
query47	2351	2426	2206	2206
query48	393	438	313	313
query49	641	530	384	384
query50	964	365	249	249
query51	4364	4356	4291	4291
query52	105	108	95	95
query53	256	287	208	208
query54	308	267	251	251
query55	97	94	91	91
query56	314	314	303	303
query57	1445	1433	1341	1341
query58	309	271	281	271
query59	1568	1713	1494	1494
query60	325	324	317	317
query61	160	153	156	153
query62	714	652	576	576
query63	259	202	210	202
query64	2384	822	650	650
query65	
query66	1677	488	393	393
query67	29871	29750	29657	29657
query68	
query69	466	348	318	318
query70	1056	1023	996	996
query71	310	281	285	281
query72	3035	2688	2416	2416
query73	881	760	437	437
query74	5132	4994	4802	4802
query75	2687	2612	2285	2285
query76	2300	1148	803	803
query77	406	419	344	344
query78	12537	12457	11919	11919
query79	1421	1044	801	801
query80	656	561	461	461
query81	450	293	250	250
query82	1381	154	122	122
query83	364	282	256	256
query84	258	137	112	112
query85	911	614	554	554
query86	400	358	357	357
query87	3410	3372	3250	3250
query88	3651	2796	2803	2796
query89	455	395	351	351
query90	1940	188	205	188
query91	199	189	186	186
query92	84	82	79	79
query93	1540	1563	923	923
query94	572	383	333	333
query95	721	499	369	369
query96	1080	797	360	360
query97	2733	2744	2594	2594
query98	277	232	235	232
query99	1151	1156	1024	1024
Total cold run time: 254903 ms
Total hot run time: 172895 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 171939 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a67fe9761c9259e6df78005ce6f432eda5dfba74, data reload: false

query5	4325	653	509	509
query6	334	224	199	199
query7	4229	572	321	321
query8	322	227	218	218
query9	8786	4066	4052	4052
query10	457	356	296	296
query11	5791	2397	2207	2207
query12	176	131	127	127
query13	1271	614	456	456
query14	6127	5498	5157	5157
query14_1	4460	4467	4487	4467
query15	218	206	187	187
query16	1023	451	449	449
query17	1172	765	617	617
query18	2729	498	375	375
query19	227	206	168	168
query20	135	139	136	136
query21	216	137	120	120
query22	13706	13607	13308	13308
query23	17338	16574	16294	16294
query23_1	16488	16320	16430	16320
query24	7414	1789	1315	1315
query24_1	1352	1313	1325	1313
query25	566	478	414	414
query26	1318	319	185	185
query27	2654	533	355	355
query28	4317	1990	2001	1990
query29	981	632	487	487
query30	305	228	196	196
query31	1126	1079	952	952
query32	87	76	71	71
query33	538	345	297	297
query34	1184	1121	664	664
query35	771	800	703	703
query36	1387	1370	1239	1239
query37	158	105	95	95
query38	3195	3188	3092	3092
query39	925	918	894	894
query39_1	903	884	907	884
query40	227	151	122	122
query41	65	64	63	63
query42	109	112	112	112
query43	325	337	288	288
query44	
query45	214	199	196	196
query46	1100	1248	764	764
query47	2370	2410	2306	2306
query48	402	407	299	299
query49	641	496	398	398
query50	963	353	250	250
query51	4378	4332	4326	4326
query52	103	104	93	93
query53	259	281	202	202
query54	308	277	265	265
query55	91	89	85	85
query56	299	311	317	311
query57	1450	1425	1319	1319
query58	301	263	273	263
query59	1561	1650	1414	1414
query60	318	326	331	326
query61	157	154	156	154
query62	702	646	592	592
query63	238	201	201	201
query64	2380	785	640	640
query65	
query66	1667	494	354	354
query67	29691	29758	29418	29418
query68	
query69	471	340	303	303
query70	960	989	1003	989
query71	305	274	264	264
query72	3133	2910	2424	2424
query73	816	759	457	457
query74	5119	4979	4821	4821
query75	2708	2607	2289	2289
query76	2251	1161	802	802
query77	425	409	338	338
query78	12423	12511	11954	11954
query79	1467	1000	712	712
query80	631	551	473	473
query81	452	280	242	242
query82	1371	155	122	122
query83	360	276	245	245
query84	271	142	108	108
query85	899	538	467	467
query86	383	352	323	323
query87	3425	3354	3234	3234
query88	3653	2743	2759	2743
query89	438	400	346	346
query90	1860	175	178	175
query91	178	169	147	147
query92	84	79	69	69
query93	1533	1479	901	901
query94	553	343	266	266
query95	696	474	339	339
query96	1020	770	331	331
query97	2741	2728	2625	2625
query98	233	228	225	225
query99	1193	1152	1026	1026
Total cold run time: 253967 ms
Total hot run time: 171939 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 39.27% (161/410) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.96% (21011/38939)
Line Coverage 37.50% (199213/531219)
Region Coverage 33.74% (156056/462466)
Branch Coverage 34.77% (67952/195417)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants