Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](inverted index) Add missing memory usage calculation for BKD index #47297

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

airborne12
Copy link
Member

@airborne12 airborne12 commented Jan 21, 2025

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:
This pull request includes changes to the be/src and be/test directories to enhance the functionality and testing of the inverted index searcher. The most important changes include updating the subproject commit, adding a new field to track reader size, and introducing comprehensive tests for the inverted index searcher.

Enhancements to functionality:

Testing improvements:

Codebase updates:

  • be/src/clucene: Updated the subproject commit to the latest version.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@airborne12
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32293 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 84c8f02444b95c60e4c7e7fd0d6ff5dc05e5fe5b, data reload: false

------ Round 1 ----------------------------------
q1	17597	5520	5455	5455
q2	2051	308	191	191
q3	10398	1286	722	722
q4	10216	976	523	523
q5	7458	2444	2156	2156
q6	193	169	133	133
q7	912	765	607	607
q8	9254	1373	1165	1165
q9	5209	4899	4922	4899
q10	6830	2354	1891	1891
q11	472	284	256	256
q12	338	363	214	214
q13	17759	3713	3103	3103
q14	233	229	208	208
q15	504	476	466	466
q16	625	624	578	578
q17	566	880	331	331
q18	7225	6564	6405	6405
q19	1466	955	525	525
q20	304	314	206	206
q21	2826	2131	1955	1955
q22	373	340	304	304
Total cold run time: 102809 ms
Total hot run time: 32293 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5549	5490	5470	5470
q2	245	337	236	236
q3	2331	2617	2319	2319
q4	1430	1821	1423	1423
q5	4329	4808	4702	4702
q6	166	156	125	125
q7	2045	1998	1839	1839
q8	2668	2840	2677	2677
q9	7311	7224	7213	7213
q10	2976	3274	2789	2789
q11	575	501	495	495
q12	679	742	618	618
q13	3612	3869	3276	3276
q14	278	320	277	277
q15	522	475	474	474
q16	656	694	639	639
q17	1238	1744	1252	1252
q18	7753	7529	7277	7277
q19	801	1170	1070	1070
q20	1992	2044	1876	1876
q21	5610	5204	4954	4954
q22	594	579	534	534
Total cold run time: 53360 ms
Total hot run time: 51535 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 186958 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 84c8f02444b95c60e4c7e7fd0d6ff5dc05e5fe5b, data reload: false

query1	985	386	360	360
query2	6522	2065	2059	2059
query3	6800	220	213	213
query4	33197	23674	22967	22967
query5	4369	614	452	452
query6	281	183	184	183
query7	4607	485	309	309
query8	286	234	214	214
query9	9417	2621	2620	2620
query10	476	320	246	246
query11	18129	15218	15153	15153
query12	152	108	104	104
query13	1632	504	383	383
query14	9423	7059	6265	6265
query15	220	201	193	193
query16	7805	622	500	500
query17	1595	711	555	555
query18	1979	391	293	293
query19	204	175	178	175
query20	115	114	110	110
query21	207	123	106	106
query22	4044	4225	4212	4212
query23	33669	32897	32943	32897
query24	6544	2329	2214	2214
query25	481	450	391	391
query26	1211	267	154	154
query27	2369	456	334	334
query28	5401	2441	2445	2441
query29	701	559	412	412
query30	235	191	162	162
query31	934	878	797	797
query32	73	63	61	61
query33	534	365	309	309
query34	730	833	511	511
query35	796	819	730	730
query36	1006	1060	932	932
query37	116	95	80	80
query38	4238	4182	4018	4018
query39	1448	1400	1420	1400
query40	207	114	106	106
query41	53	52	52	52
query42	122	102	102	102
query43	517	521	491	491
query44	1354	804	790	790
query45	177	168	160	160
query46	854	1031	644	644
query47	1827	1828	1738	1738
query48	370	391	321	321
query49	780	485	406	406
query50	616	648	400	400
query51	6802	6959	6710	6710
query52	100	101	95	95
query53	220	247	183	183
query54	480	493	419	419
query55	88	77	80	77
query56	256	265	240	240
query57	1192	1163	1086	1086
query58	253	232	237	232
query59	3027	3142	3029	3029
query60	279	257	256	256
query61	155	154	164	154
query62	793	706	664	664
query63	233	206	202	202
query64	4344	1099	755	755
query65	3325	3159	3154	3154
query66	1095	414	325	325
query67	15949	15628	15454	15454
query68	5118	812	537	537
query69	488	284	250	250
query70	1179	1093	1102	1093
query71	372	291	246	246
query72	5754	3811	3811	3811
query73	648	745	352	352
query74	10361	8872	8836	8836
query75	3146	3165	2658	2658
query76	3142	1172	762	762
query77	462	350	282	282
query78	9961	10058	9445	9445
query79	2889	793	599	599
query80	1159	523	452	452
query81	555	288	239	239
query82	352	147	119	119
query83	183	176	155	155
query84	243	94	72	72
query85	741	354	375	354
query86	434	311	309	309
query87	4363	4444	4324	4324
query88	4650	2131	2110	2110
query89	408	334	339	334
query90	1891	189	192	189
query91	135	145	110	110
query92	71	55	50	50
query93	2757	898	534	534
query94	735	397	278	278
query95	329	263	256	256
query96	493	612	283	283
query97	2804	2897	2708	2708
query98	229	205	197	197
query99	1269	1376	1254	1254
Total cold run time: 285720 ms
Total hot run time: 186958 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.3 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 84c8f02444b95c60e4c7e7fd0d6ff5dc05e5fe5b, data reload: false

query1	0.04	0.04	0.03
query2	0.07	0.03	0.04
query3	0.24	0.07	0.07
query4	1.61	0.10	0.10
query5	0.41	0.42	0.40
query6	1.16	0.66	0.65
query7	0.02	0.01	0.01
query8	0.05	0.04	0.03
query9	0.59	0.52	0.50
query10	0.56	0.56	0.56
query11	0.15	0.10	0.10
query12	0.14	0.11	0.11
query13	0.61	0.61	0.60
query14	2.86	2.84	2.74
query15	0.91	0.82	0.82
query16	0.38	0.38	0.37
query17	1.05	1.00	1.09
query18	0.24	0.20	0.21
query19	1.98	1.88	2.04
query20	0.02	0.01	0.02
query21	15.36	0.93	0.57
query22	0.77	0.74	0.71
query23	15.29	1.47	0.52
query24	3.07	1.72	1.59
query25	0.13	0.25	0.12
query26	0.19	0.15	0.14
query27	0.05	0.04	0.04
query28	14.40	0.96	0.43
query29	12.57	3.89	3.30
query30	0.25	0.09	0.06
query31	2.83	0.59	0.39
query32	3.23	0.55	0.46
query33	2.95	2.96	2.98
query34	16.71	5.18	4.48
query35	4.57	4.53	4.55
query36	0.63	0.49	0.47
query37	0.09	0.06	0.06
query38	0.06	0.04	0.04
query39	0.04	0.02	0.03
query40	0.17	0.13	0.14
query41	0.08	0.03	0.02
query42	0.04	0.02	0.02
query43	0.03	0.03	0.03
Total cold run time: 106.6 s
Total hot run time: 31.3 s

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 41.56% (10835/26068)
Line Coverage: 31.96% (91571/286519)
Region Coverage: 31.10% (46911/150838)
Branch Coverage: 27.19% (23756/87386)
Coverage Report: http://coverage.selectdb-in.cc/coverage/84c8f02444b95c60e4c7e7fd0d6ff5dc05e5fe5b_84c8f02444b95c60e4c7e7fd0d6ff5dc05e5fe5b/report/index.html

qidaye
qidaye previously approved these changes Jan 22, 2025
Copy link
Contributor

@qidaye qidaye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Jan 22, 2025
Copy link
Contributor

PR approved by anyone and no changes requested.

@airborne12
Copy link
Member Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Jan 22, 2025
Copy link
Contributor

@qidaye qidaye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 22, 2025
Copy link
Contributor

@zzzxl1993 zzzxl1993 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@doris-robot
Copy link

TPC-H: Total hot run time: 32398 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 2a15d39a8174602a2fc5cd3e2ff728842d603039, data reload: false

------ Round 1 ----------------------------------
q1	17574	5705	5375	5375
q2	2054	305	177	177
q3	10479	1230	761	761
q4	10232	970	536	536
q5	7971	2379	2210	2210
q6	192	166	134	134
q7	932	758	618	618
q8	9239	1385	1170	1170
q9	5349	4926	4893	4893
q10	6842	2318	1882	1882
q11	478	274	259	259
q12	340	361	225	225
q13	17771	3725	3073	3073
q14	233	239	211	211
q15	526	483	478	478
q16	640	631	586	586
q17	573	868	324	324
q18	7056	6496	6414	6414
q19	4744	968	547	547
q20	293	318	188	188
q21	2765	2220	2022	2022
q22	374	348	315	315
Total cold run time: 106657 ms
Total hot run time: 32398 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6100	5515	5465	5465
q2	242	334	245	245
q3	2282	2716	2329	2329
q4	1417	1799	1398	1398
q5	4385	4754	4872	4754
q6	167	160	127	127
q7	2117	1941	1850	1850
q8	2672	2882	2748	2748
q9	7370	7280	7275	7275
q10	3069	3227	2717	2717
q11	594	520	494	494
q12	748	807	640	640
q13	3565	3873	3265	3265
q14	309	294	294	294
q15	507	464	466	464
q16	644	699	640	640
q17	1266	1754	1267	1267
q18	7888	7573	7391	7391
q19	814	1155	1075	1075
q20	1984	2027	1925	1925
q21	5624	5163	5033	5033
q22	603	610	593	593
Total cold run time: 54367 ms
Total hot run time: 51989 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 195813 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 2a15d39a8174602a2fc5cd3e2ff728842d603039, data reload: false

query1	1328	982	919	919
query2	6211	2111	2138	2111
query3	11102	4667	4572	4572
query4	32559	23883	23630	23630
query5	3466	598	484	484
query6	292	200	180	180
query7	3990	506	309	309
query8	292	253	234	234
query9	9256	2696	2685	2685
query10	459	309	267	267
query11	17759	15166	15155	15155
query12	163	109	105	105
query13	1573	507	400	400
query14	9060	6350	7266	6350
query15	249	210	194	194
query16	7767	706	483	483
query17	1580	732	577	577
query18	1915	408	307	307
query19	207	203	171	171
query20	125	116	113	113
query21	213	127	112	112
query22	4478	4549	4361	4361
query23	34703	33466	33536	33466
query24	6538	2456	2447	2447
query25	485	481	405	405
query26	809	288	166	166
query27	2336	463	347	347
query28	5668	2519	2481	2481
query29	619	566	440	440
query30	219	183	154	154
query31	954	930	854	854
query32	78	59	59	59
query33	478	359	319	319
query34	803	880	520	520
query35	823	838	765	765
query36	996	1062	944	944
query37	120	96	77	77
query38	4272	4325	4259	4259
query39	1489	1437	1446	1437
query40	194	117	103	103
query41	51	52	51	51
query42	123	111	101	101
query43	538	541	497	497
query44	1323	840	839	839
query45	192	208	169	169
query46	896	1073	662	662
query47	1911	1966	1870	1870
query48	385	409	319	319
query49	722	503	437	437
query50	664	668	409	409
query51	7019	7035	6990	6990
query52	99	102	96	96
query53	225	250	187	187
query54	502	499	444	444
query55	96	79	78	78
query56	262	263	243	243
query57	1234	1200	1175	1175
query58	255	236	254	236
query59	3333	3401	3272	3272
query60	302	272	289	272
query61	124	120	120	120
query62	788	721	667	667
query63	219	191	188	188
query64	3591	1014	662	662
query65	3218	3134	3158	3134
query66	770	396	297	297
query67	15872	15815	15529	15529
query68	2059	824	552	552
query69	437	300	270	270
query70	1214	1145	1185	1145
query71	323	287	257	257
query72	6206	4035	4037	4035
query73	643	788	365	365
query74	9698	9018	9074	9018
query75	3160	3195	2831	2831
query76	2354	1170	766	766
query77	484	353	281	281
query78	10057	10198	9291	9291
query79	3546	806	593	593
query80	1731	539	456	456
query81	564	273	234	234
query82	386	153	127	127
query83	261	171	155	155
query84	241	88	70	70
query85	884	367	306	306
query86	465	314	311	311
query87	4452	4600	4478	4478
query88	5364	2201	2173	2173
query89	403	322	297	297
query90	1776	196	191	191
query91	132	145	109	109
query92	61	56	50	50
query93	2807	870	540	540
query94	701	405	294	294
query95	335	269	269	269
query96	491	613	277	277
query97	2803	2854	2723	2723
query98	237	194	198	194
query99	1296	1393	1249	1249
Total cold run time: 285913 ms
Total hot run time: 195813 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.33 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 2a15d39a8174602a2fc5cd3e2ff728842d603039, data reload: false

query1	0.03	0.03	0.03
query2	0.07	0.03	0.04
query3	0.24	0.07	0.06
query4	1.62	0.11	0.10
query5	0.41	0.41	0.41
query6	1.15	0.67	0.65
query7	0.02	0.01	0.02
query8	0.04	0.04	0.02
query9	0.58	0.50	0.51
query10	0.56	0.56	0.55
query11	0.15	0.11	0.11
query12	0.13	0.12	0.11
query13	0.60	0.61	0.60
query14	2.70	2.84	2.73
query15	0.89	0.83	0.83
query16	0.35	0.38	0.39
query17	0.97	1.03	0.98
query18	0.23	0.20	0.20
query19	1.98	1.83	1.98
query20	0.01	0.02	0.01
query21	15.36	0.99	0.59
query22	0.75	0.75	0.60
query23	15.38	1.42	0.57
query24	2.94	1.54	2.05
query25	0.12	0.10	0.19
query26	0.27	0.13	0.14
query27	0.06	0.04	0.05
query28	14.11	0.99	0.44
query29	12.55	4.05	3.30
query30	0.25	0.09	0.07
query31	2.82	0.61	0.36
query32	3.23	0.55	0.47
query33	2.98	3.06	3.07
query34	16.64	5.17	4.56
query35	4.59	4.57	4.56
query36	0.68	0.50	0.48
query37	0.09	0.07	0.06
query38	0.05	0.03	0.04
query39	0.04	0.02	0.03
query40	0.16	0.14	0.13
query41	0.07	0.03	0.03
query42	0.04	0.03	0.02
query43	0.03	0.03	0.03
Total cold run time: 105.94 s
Total hot run time: 31.33 s

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 41.55% (10832/26068)
Line Coverage: 31.96% (91583/286536)
Region Coverage: 31.09% (46893/150845)
Branch Coverage: 27.18% (23751/87388)
Coverage Report: http://coverage.selectdb-in.cc/coverage/2a15d39a8174602a2fc5cd3e2ff728842d603039_2a15d39a8174602a2fc5cd3e2ff728842d603039/report/index.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants