Skip to content

[fix](nereids) Preserve column metadata in Alias.toSlot() for wrapped expressions#60839

Open
airborne12 wants to merge 1 commit intoapache:masterfrom
airborne12:fix-alias-toSlot-preserve-column-info
Open

[fix](nereids) Preserve column metadata in Alias.toSlot() for wrapped expressions#60839
airborne12 wants to merge 1 commit intoapache:masterfrom
airborne12:fix-alias-toSlot-preserve-column-info

Conversation

@airborne12
Copy link
Member

What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:

When Alias.toSlot() wraps a non-SlotReference expression (e.g., Cast(ElementAt(SlotRef, Literal)) for variant subcolumn access, or Cast(SlotRef) for explicit type conversion), it loses column metadata (originalTable, originalColumn, oneLevelTable, oneLevelColumn, subPath).

This causes ExpressionTranslator.visitMatch() to crash with:

"SlotReference in Match failed to get Column"

The bug manifests when MATCH is inside an OR predicate, because:

  • AND-only: MATCH conjuncts are pushed through the project independently, and slots get correctly replaced
  • OR: The entire OR stays above the project, so MATCH references the alias output slot which lacks metadata

Fix: Added extractSlotReference() that uses getInputSlots() to find the unique underlying SlotReference through any expression wrapper depth (Cast, ElementAt, etc.). Only returns a SlotReference when there is exactly one input slot, ensuring unambiguous column origin. Returns null for multi-slot expressions like CONCAT(col1, col2).

Also fixed a pre-existing bug where the oneLevelColumn parameter was incorrectly passing getOriginalColumn() instead of getOneLevelColumn().

Release note

Fix MATCH expressions crashing with "SlotReference in Match failed to get Column" when used on alias columns (from CTE/subquery) combined with OR predicates.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

… expressions

When Alias wraps non-SlotReference expressions like Cast(ElementAt(SlotRef, Literal))
(variant subcolumn access), toSlot() was losing originalTable, originalColumn,
oneLevelTable, oneLevelColumn, and subPath metadata. This caused
ExpressionTranslator.visitMatch() to crash with "SlotReference in Match failed to
get Column" when MATCH was inside OR predicates (where pushdown through project
doesn't happen).

Fix: Use getInputSlots() to find the unique underlying SlotReference through any
expression wrapper depth. Also fixed pre-existing bug where oneLevelColumn parameter
was incorrectly using getOriginalColumn() instead of getOneLevelColumn().
@Thearas
Copy link
Contributor

Thearas commented Feb 26, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@airborne12
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 28777 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 5f66f480bbeb203372904e514ebb7c5145208fa5, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17630	4538	4325	4325
q2	q3	10653	756	518	518
q4	4681	361	264	264
q5	7551	1185	1013	1013
q6	172	171	146	146
q7	767	858	651	651
q8	9291	1426	1334	1334
q9	4837	4736	4683	4683
q10	6863	1865	1634	1634
q11	476	267	247	247
q12	764	564	466	466
q13	17794	4205	3416	3416
q14	231	230	210	210
q15	942	792	790	790
q16	753	709	685	685
q17	744	828	437	437
q18	5956	5319	5213	5213
q19	1306	978	598	598
q20	489	491	397	397
q21	4567	1994	1493	1493
q22	394	310	257	257
Total cold run time: 96861 ms
Total hot run time: 28777 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4716	4529	4596	4529
q2	q3	1915	2227	1808	1808
q4	864	1204	792	792
q5	4076	4434	4396	4396
q6	189	176	145	145
q7	1842	1662	1509	1509
q8	2460	2702	2539	2539
q9	7451	7392	7267	7267
q10	2604	2785	2408	2408
q11	512	425	418	418
q12	499	580	451	451
q13	4150	4484	3575	3575
q14	273	288	265	265
q15	857	813	789	789
q16	681	795	737	737
q17	1170	1606	1314	1314
q18	7100	6761	6590	6590
q19	913	875	900	875
q20	2128	2134	2015	2015
q21	3944	3483	3350	3350
q22	429	434	381	381
Total cold run time: 48773 ms
Total hot run time: 46153 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 183883 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 5f66f480bbeb203372904e514ebb7c5145208fa5, data reload: false

query5	5159	646	511	511
query6	325	221	204	204
query7	4214	492	271	271
query8	325	240	234	234
query9	8696	2748	2758	2748
query10	557	402	344	344
query11	16958	17677	17260	17260
query12	213	146	124	124
query13	1279	461	333	333
query14	7321	3236	3104	3104
query14_1	2884	2933	2946	2933
query15	205	217	180	180
query16	1044	507	507	507
query17	1514	749	609	609
query18	2783	462	345	345
query19	214	206	185	185
query20	141	134	130	130
query21	213	138	128	128
query22	5727	5070	4866	4866
query23	17172	16729	16516	16516
query23_1	16678	16776	16687	16687
query24	6903	1611	1210	1210
query24_1	1231	1275	1231	1231
query25	538	452	395	395
query26	1244	266	164	164
query27	2745	472	289	289
query28	4493	1860	1868	1860
query29	786	555	460	460
query30	301	243	208	208
query31	861	719	642	642
query32	82	69	70	69
query33	525	337	282	282
query34	902	911	577	577
query35	644	680	585	585
query36	1077	1087	987	987
query37	139	103	84	84
query38	2968	2941	2859	2859
query39	893	860	853	853
query39_1	854	815	837	815
query40	238	155	136	136
query41	66	64	63	63
query42	109	105	104	104
query43	379	386	355	355
query44	
query45	203	193	187	187
query46	880	987	608	608
query47	2117	2120	2052	2052
query48	319	328	234	234
query49	650	486	398	398
query50	707	281	213	213
query51	4083	4056	4080	4056
query52	117	113	103	103
query53	301	343	291	291
query54	317	299	288	288
query55	94	89	86	86
query56	337	330	327	327
query57	1360	1352	1260	1260
query58	303	281	281	281
query59	2573	2707	2549	2549
query60	351	353	339	339
query61	176	170	170	170
query62	625	587	551	551
query63	340	281	288	281
query64	4926	1373	1084	1084
query65	
query66	1401	474	372	372
query67	16369	16428	16245	16245
query68	
query69	402	331	301	301
query70	993	959	916	916
query71	341	310	301	301
query72	2841	2672	2355	2355
query73	537	559	322	322
query74	9938	9905	9713	9713
query75	2868	2736	2462	2462
query76	2305	1027	696	696
query77	364	410	312	312
query78	11135	11325	10711	10711
query79	3089	820	602	602
query80	1790	624	519	519
query81	597	272	259	259
query82	1025	148	114	114
query83	337	264	239	239
query84	261	118	100	100
query85	887	463	418	418
query86	494	310	291	291
query87	3141	3115	3060	3060
query88	3569	2658	2635	2635
query89	420	372	353	353
query90	2159	178	179	178
query91	160	164	161	161
query92	92	77	72	72
query93	2089	817	492	492
query94	654	329	288	288
query95	565	391	306	306
query96	635	515	232	232
query97	2443	2474	2409	2409
query98	232	213	216	213
query99	995	968	907	907
Total cold run time: 258970 ms
Total hot run time: 183883 ms

* unambiguous column origin. Handles Cast(slot), Cast(ElementAt(slot, literal)), etc.
* Returns null for multi-slot expressions like CONCAT(col1, col2).
*/
private static SlotReference extractSlotReference(Expression expr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot search for SlotReference in the child nodes in this way. Because the original purpose of the name-related fields in Slot was to be compatible with the metadata in the MySQL protocol. This approach of searching will violate the constraints of the MySQL protocol.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants