Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](restore) Make the DirMoveTask idempotent. #47313

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

w41ter
Copy link
Contributor

@w41ter w41ter commented Jan 22, 2025

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Since the DirMoveTask is executed asynchronously, the FE might send the task again to ensure its completion eventually. But the rowsets committed during two DirMoveTasks (if any) will be dropped, which causes the data loss.

This PR adds a LOADED tag file to indicate that the snapshot has been loaded into a tablet and should not be reloaded again.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@w41ter
Copy link
Contributor Author

w41ter commented Jan 22, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32385 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 5232a81c8661647d7b342160e2a62213cd4286ec, data reload: false

------ Round 1 ----------------------------------
q1	17594	5599	5441	5441
q2	2070	306	169	169
q3	10518	1248	774	774
q4	10246	973	531	531
q5	8164	2436	2181	2181
q6	201	169	135	135
q7	904	778	617	617
q8	9239	1362	1194	1194
q9	5255	4866	4902	4866
q10	6845	2323	1883	1883
q11	475	277	248	248
q12	343	360	219	219
q13	17757	3698	3084	3084
q14	228	224	209	209
q15	522	480	472	472
q16	627	606	590	590
q17	577	880	341	341
q18	7140	6631	6429	6429
q19	1561	947	539	539
q20	305	319	193	193
q21	2859	2194	1957	1957
q22	359	342	313	313
Total cold run time: 103789 ms
Total hot run time: 32385 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5576	5502	5533	5502
q2	238	333	238	238
q3	2265	2658	2296	2296
q4	1402	1807	1363	1363
q5	4328	4719	4703	4703
q6	165	159	131	131
q7	2107	1925	1882	1882
q8	2649	2878	2726	2726
q9	7377	7304	7316	7304
q10	3029	3280	2783	2783
q11	568	503	481	481
q12	643	753	557	557
q13	3628	3967	3351	3351
q14	290	305	282	282
q15	524	474	466	466
q16	651	702	658	658
q17	1244	1762	1274	1274
q18	7746	7569	7314	7314
q19	824	1199	1098	1098
q20	2008	2038	1897	1897
q21	5768	5261	5132	5132
q22	629	618	565	565
Total cold run time: 53659 ms
Total hot run time: 52003 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 195178 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 5232a81c8661647d7b342160e2a62213cd4286ec, data reload: false

query1	1313	949	931	931
query2	6296	2127	1998	1998
query3	10955	4486	4483	4483
query4	60981	28557	23031	23031
query5	5605	582	447	447
query6	443	201	195	195
query7	5571	498	291	291
query8	324	240	233	233
query9	8523	2709	2702	2702
query10	442	329	250	250
query11	17407	14999	15411	14999
query12	170	113	108	108
query13	1475	541	462	462
query14	11207	7413	7540	7413
query15	213	193	195	193
query16	7309	628	498	498
query17	1102	724	581	581
query18	1929	404	321	321
query19	204	185	151	151
query20	115	117	112	112
query21	215	129	103	103
query22	4454	4543	4550	4543
query23	34281	33562	33439	33439
query24	5637	2376	2240	2240
query25	490	465	414	414
query26	635	273	154	154
query27	1809	489	322	322
query28	4124	2540	2489	2489
query29	519	561	440	440
query30	211	194	159	159
query31	918	888	840	840
query32	75	57	55	55
query33	443	363	307	307
query34	947	884	517	517
query35	803	844	763	763
query36	1021	1060	961	961
query37	121	96	73	73
query38	4416	4328	4292	4292
query39	1500	1461	1428	1428
query40	220	122	104	104
query41	51	71	57	57
query42	118	100	99	99
query43	511	543	490	490
query44	1354	820	820	820
query45	180	173	174	173
query46	876	1076	654	654
query47	1901	1945	1856	1856
query48	395	410	335	335
query49	714	498	439	439
query50	660	704	397	397
query51	6966	6898	6988	6898
query52	103	99	93	93
query53	232	270	197	197
query54	504	505	419	419
query55	87	76	80	76
query56	254	268	271	268
query57	1215	1222	1170	1170
query58	241	243	237	237
query59	3220	3380	3153	3153
query60	277	293	262	262
query61	118	117	120	117
query62	742	728	650	650
query63	217	180	182	180
query64	1310	1013	662	662
query65	3250	3165	3175	3165
query66	723	398	306	306
query67	16104	15749	15368	15368
query68	4572	833	526	526
query69	473	296	279	279
query70	1191	1127	1100	1100
query71	395	289	251	251
query72	6061	3944	3808	3808
query73	772	754	365	365
query74	9854	9105	9086	9086
query75	3223	3146	2668	2668
query76	3528	1177	759	759
query77	491	369	342	342
query78	10125	10096	9409	9409
query79	2754	801	601	601
query80	1699	613	459	459
query81	553	278	245	245
query82	349	150	115	115
query83	282	179	155	155
query84	283	92	71	71
query85	767	357	296	296
query86	416	299	303	299
query87	4534	4478	4363	4363
query88	3533	2219	2190	2190
query89	395	319	288	288
query90	1572	188	190	188
query91	134	141	109	109
query92	65	59	53	53
query93	2100	868	529	529
query94	709	406	285	285
query95	329	269	253	253
query96	503	610	284	284
query97	2797	2850	2764	2764
query98	238	202	196	196
query99	1286	1366	1259	1259
Total cold run time: 312729 ms
Total hot run time: 195178 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 41.64% (10855/26070)
Line Coverage: 32.04% (91821/286552)
Region Coverage: 31.21% (47079/150854)
Branch Coverage: 27.27% (23828/87390)
Coverage Report: http://coverage.selectdb-in.cc/coverage/5232a81c8661647d7b342160e2a62213cd4286ec_5232a81c8661647d7b342160e2a62213cd4286ec/report/index.html

@doris-robot
Copy link

ClickBench: Total hot run time: 30.94 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 5232a81c8661647d7b342160e2a62213cd4286ec, data reload: false

query1	0.04	0.03	0.03
query2	0.07	0.03	0.03
query3	0.24	0.07	0.06
query4	1.62	0.11	0.10
query5	0.42	0.41	0.41
query6	1.19	0.66	0.64
query7	0.02	0.02	0.02
query8	0.04	0.04	0.03
query9	0.60	0.52	0.50
query10	0.56	0.56	0.54
query11	0.15	0.10	0.11
query12	0.13	0.10	0.11
query13	0.61	0.59	0.60
query14	2.67	2.73	2.86
query15	0.89	0.83	0.82
query16	0.38	0.39	0.38
query17	1.07	1.03	1.02
query18	0.22	0.21	0.20
query19	1.94	1.79	1.93
query20	0.01	0.02	0.01
query21	15.35	0.91	0.59
query22	0.74	1.06	0.83
query23	14.88	1.43	0.60
query24	2.60	1.08	1.06
query25	0.15	0.13	0.15
query26	0.42	0.15	0.14
query27	0.06	0.06	0.07
query28	13.31	1.08	0.43
query29	12.56	3.91	3.30
query30	0.24	0.09	0.06
query31	2.83	0.61	0.38
query32	3.23	0.54	0.46
query33	2.98	3.03	3.01
query34	16.41	5.11	4.50
query35	4.55	4.53	4.50
query36	0.65	0.49	0.48
query37	0.09	0.07	0.06
query38	0.05	0.03	0.03
query39	0.04	0.02	0.02
query40	0.17	0.13	0.12
query41	0.08	0.03	0.03
query42	0.04	0.03	0.02
query43	0.04	0.03	0.03
Total cold run time: 104.34 s
Total hot run time: 30.94 s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants