Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat](inverted index) Adding Storage Format V3 for Inverted Index #46124

Closed

Conversation

zzzxl1993
Copy link
Contributor

@zzzxl1993 zzzxl1993 commented Dec 28, 2024

@zzzxl1993
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@zzzxl1993 zzzxl1993 force-pushed the branch-3.0.202412231725 branch from ba44be6 to 18ff3de Compare December 28, 2024 09:32
@zzzxl1993
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40759 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 18ff3deea4bb58f4e8fbd607976866ad3754b853, data reload: false

------ Round 1 ----------------------------------
q1	17939	7678	7253	7253
q2	2075	196	167	167
q3	10632	1064	1088	1064
q4	10456	767	830	767
q5	7748	2900	2779	2779
q6	231	144	143	143
q7	956	610	593	593
q8	9354	1947	2029	1947
q9	6547	6400	6441	6400
q10	7037	2294	2326	2294
q11	459	264	257	257
q12	407	210	207	207
q13	17772	2969	2967	2967
q14	245	212	205	205
q15	567	537	516	516
q16	700	604	613	604
q17	978	555	554	554
q18	7374	6655	6773	6655
q19	1379	1033	1023	1023
q20	470	208	202	202
q21	3985	3221	3146	3146
q22	1100	1023	1016	1016
Total cold run time: 108411 ms
Total hot run time: 40759 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7247	7199	7228	7199
q2	328	228	239	228
q3	2952	2966	2927	2927
q4	2017	1828	1821	1821
q5	5680	5712	5760	5712
q6	223	142	137	137
q7	2233	1822	1836	1822
q8	3332	3562	3451	3451
q9	8766	8855	8853	8853
q10	3556	3552	3530	3530
q11	616	503	497	497
q12	834	605	612	605
q13	9278	3112	3208	3112
q14	319	290	275	275
q15	578	548	525	525
q16	722	666	678	666
q17	1836	1626	1624	1624
q18	8219	7797	7651	7651
q19	1649	1529	1544	1529
q20	2139	1880	1863	1863
q21	5646	5363	5322	5322
q22	1142	1060	1048	1048
Total cold run time: 69312 ms
Total hot run time: 60397 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 197670 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 18ff3deea4bb58f4e8fbd607976866ad3754b853, data reload: false

query1	1304	922	912	912
query2	6209	2070	2037	2037
query3	10802	4373	4299	4299
query4	66242	28130	23400	23400
query5	4935	457	444	444
query6	399	164	172	164
query7	5507	313	314	313
query8	309	236	229	229
query9	8431	2685	2689	2685
query10	439	273	269	269
query11	17072	15135	15675	15135
query12	159	111	100	100
query13	1485	444	427	427
query14	10881	7599	7605	7599
query15	211	176	188	176
query16	7182	481	483	481
query17	1090	577	587	577
query18	1971	316	321	316
query19	201	161	161	161
query20	118	113	111	111
query21	56	45	45	45
query22	4733	4727	4544	4544
query23	35087	34125	34663	34125
query24	6140	2908	2873	2873
query25	520	424	428	424
query26	652	162	165	162
query27	1812	301	305	301
query28	4472	2487	2488	2487
query29	727	472	417	417
query30	238	170	166	166
query31	998	832	853	832
query32	65	58	57	57
query33	472	275	285	275
query34	916	531	495	495
query35	836	727	731	727
query36	1069	967	967	967
query37	125	81	71	71
query38	4139	4040	4152	4040
query39	1518	1485	1437	1437
query40	139	81	82	81
query41	50	46	46	46
query42	120	103	102	102
query43	534	476	490	476
query44	1204	836	823	823
query45	187	173	171	171
query46	1138	712	737	712
query47	2008	1937	1929	1929
query48	466	375	369	369
query49	724	399	394	394
query50	819	428	423	423
query51	7419	7232	7255	7232
query52	96	85	87	85
query53	250	182	185	182
query54	550	448	444	444
query55	78	74	76	74
query56	257	249	242	242
query57	1211	1140	1080	1080
query58	208	206	207	206
query59	3092	2901	2810	2810
query60	290	261	246	246
query61	108	116	109	109
query62	786	647	694	647
query63	213	187	199	187
query64	1435	696	651	651
query65	3275	3185	3235	3185
query66	710	299	299	299
query67	16041	15701	15636	15636
query68	3869	571	569	569
query69	442	272	262	262
query70	1177	1153	1083	1083
query71	351	259	263	259
query72	6270	3982	4018	3982
query73	753	341	347	341
query74	10055	9129	9109	9109
query75	3349	2601	2628	2601
query76	1873	1006	1144	1006
query77	487	271	285	271
query78	10675	9629	9601	9601
query79	1509	592	608	592
query80	869	445	421	421
query81	532	241	236	236
query82	1212	121	110	110
query83	165	140	144	140
query84	287	83	85	83
query85	867	295	287	287
query86	326	296	304	296
query87	4449	4221	4287	4221
query88	3690	2374	2339	2339
query89	406	283	292	283
query90	2017	186	184	184
query91	178	153	149	149
query92	62	50	48	48
query93	1837	541	537	537
query94	797	297	298	297
query95	353	255	258	255
query96	606	280	278	278
query97	3367	3240	3241	3240
query98	218	206	189	189
query99	1618	1282	1280	1280
Total cold run time: 317596 ms
Total hot run time: 197670 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 33.63 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 18ff3deea4bb58f4e8fbd607976866ad3754b853, data reload: false

query1	0.04	0.03	0.02
query2	0.09	0.05	0.05
query3	0.23	0.06	0.05
query4	1.64	0.08	0.08
query5	0.52	0.51	0.52
query6	1.13	0.74	0.74
query7	0.02	0.02	0.02
query8	0.06	0.05	0.05
query9	0.55	0.51	0.50
query10	0.56	0.55	0.54
query11	0.17	0.12	0.12
query12	0.15	0.12	0.13
query13	0.62	0.59	0.60
query14	3.09	2.99	2.92
query15	0.93	0.84	0.82
query16	0.39	0.36	0.39
query17	1.07	1.02	1.05
query18	0.19	0.20	0.18
query19	1.99	1.93	1.99
query20	0.01	0.02	0.01
query21	15.36	0.69	0.66
query22	4.78	7.18	1.88
query23	18.23	1.36	1.31
query24	2.18	0.22	0.23
query25	0.14	0.08	0.08
query26	0.27	0.18	0.19
query27	0.08	0.09	0.07
query28	13.26	1.18	1.15
query29	12.64	3.41	3.35
query30	0.24	0.05	0.06
query31	2.84	0.40	0.40
query32	3.24	0.49	0.50
query33	2.95	3.03	3.04
query34	16.74	4.49	4.53
query35	4.63	4.55	4.57
query36	0.66	0.48	0.49
query37	0.21	0.16	0.15
query38	0.17	0.15	0.16
query39	0.06	0.05	0.04
query40	0.16	0.14	0.13
query41	0.11	0.05	0.05
query42	0.06	0.06	0.05
query43	0.05	0.04	0.05
Total cold run time: 112.51 s
Total hot run time: 33.63 s

…pache#44414)

Problem Summary:

1. "Mainly added the functionality for compressing inverted index
position information and dictionary information."

2. "Position information compression must be enabled by setting
inverted_index_storage_format to v3 when creating the table."

e.g.
```
    CREATE TABLE tbl (
          ...
    ) ENGINE=OLAP
    DUPLICATE KEY(`x`)
    COMMENT "OLAP"
    DISTRIBUTED BY RANDOM BUCKETS 1
    PROPERTIES (
    "inverted_index_storage_format" = "V3"
    );
```
4. "The dictionary compression feature requires setting
inverted_index_storage_format to v3 and configuring dict_compression to
true in the properties."

e.g.
```
INDEX x_idx (`x`) USING INVERTED PROPERTIES("dict_compression" = "true") COMMENT ''
```
…CT_COMPRESS (apache#45738)

Related PR: apache#44414

Problem Summary:
In inverted index version 3 mode, using dictionary compression may
lead to incorrect results after a seek operation.
…#45805)

Related PR: apache#44414

Problem Summary:
1. dict_compression is only supported in v3 mode.
2. dict_compression is only supported for string-type fields.
@zzzxl1993 zzzxl1993 force-pushed the branch-3.0.202412231725 branch from 18ff3de to 002192f Compare December 31, 2024 10:06
@zzzxl1993
Copy link
Contributor Author

run buildall

@zzzxl1993 zzzxl1993 closed this Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants