HIVE-29166: Fix the partition column update logic in ConvertJoinMapJoin#convertJoinBucketMapJoin. #6048

ngsg · 2025-08-29T10:24:39Z

What changes were proposed in this pull request?

Modify the partition column update logic in ConvertJoinMapJoin#convertJoinBucketMapJoin so that small table's partitionCols are updated if they do not match the big table's bucketCols.

Previously, the partition columns were updated only when the number of partition columns is not equal to the number of bucket columns. However, this could cause incorrect bucket routing when the counts are equal but the column orders differ. For example, if the bucket columns are {id, part} and the partition columns are {part, id}, then the partition columns should be updated to {id, part}, but the current implementation does not update them.

The new logic now updates the partition columns if the counts are not equal or the column orders differ.

Why are the changes needed?

As described above, the current BucketMapJoin conversion logic may not update the partition columns properly, which leads to incorrect results, as reported in the JIRA.

Does this PR introduce any user-facing change?

No

How was this patch tested?

I added two qfiles in which the current Hive returns incorrect results. I also checked that the patch fixes the originally reported duplicate-generation issue.

sonarqubecloud · 2025-08-29T17:43:41Z

Quality Gate passed

Issues
100 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

okumin

At first glance, this appears to be valid. I will take a look again on weenends because the updated part is complicated

okumin · 2025-09-01T06:08:35Z

ql/src/test/results/clientpositive/llap/bucketmapjoin14.q.out

+                        key expressions: _col1 (type: string), _col2 (type: string)
+                        null sort order: zz
+                        sort order: ++
+                        Map-reduce partition columns: _col2 (type: string), _col1 (type: string)


I verified that this list is inverted on the master branch, and the result would be empty.

Map-reduce partition columns: _col1 (type: string), _col2 (type: string)

okumin · 2025-09-01T06:10:32Z

iceberg/iceberg-handler/src/test/results/positive/bucket_map_join_9.q.out

+          BucketMapJoin:true,Conds:SEL_51._col1, _col2=RS_49._col1, _col2(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5"]
+        <-Map 2 [CUSTOM_EDGE] vectorized
+          MULTICAST [RS_49]
+            PartitionCols:_col2, _col1


I verified that this list is inverted on the master branch, and the result would be empty.

PartitionCols:_col1, _col2

difin

LGTM

aturoczy · 2025-09-04T14:44:51Z

Thank you for the review. If it is OK, can we merge it to master?

ngsg added 2 commits August 29, 2025 19:03

HIVE-29166: Fix partition update logic in convertJoinBucketMapJoin

9a34ce7

Remove unused configs from the qfiles

153edcf

asf-ci-hive added tests pending tests unstable and removed tests pending labels Aug 29, 2025

Empty commit to regrigger CI test

b88a15a

asf-ci-hive added tests pending and removed tests unstable labels Aug 29, 2025

asf-ci-hive added tests passed and removed tests pending labels Aug 29, 2025

okumin reviewed Sep 1, 2025

View reviewed changes

difin approved these changes Sep 3, 2025

View reviewed changes

kasakrisz approved these changes Sep 4, 2025

View reviewed changes

kasakrisz merged commit 6f53c7f into apache:master Sep 4, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HIVE-29166: Fix the partition column update logic in ConvertJoinMapJoin#convertJoinBucketMapJoin. #6048

HIVE-29166: Fix the partition column update logic in ConvertJoinMapJoin#convertJoinBucketMapJoin. #6048

ngsg commented Aug 29, 2025

Uh oh!

sonarqubecloud bot commented Aug 29, 2025

Uh oh!

okumin left a comment

Uh oh!

okumin Sep 1, 2025

Uh oh!

okumin Sep 1, 2025

Uh oh!

difin left a comment

Uh oh!

aturoczy commented Sep 4, 2025

Uh oh!

Uh oh!

Uh oh!

HIVE-29166: Fix the partition column update logic in ConvertJoinMapJoin#convertJoinBucketMapJoin. #6048

HIVE-29166: Fix the partition column update logic in ConvertJoinMapJoin#convertJoinBucketMapJoin. #6048

Conversation

ngsg commented Aug 29, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

sonarqubecloud bot commented Aug 29, 2025

Quality Gate passed

Uh oh!

okumin left a comment

Choose a reason for hiding this comment

Uh oh!

okumin Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

okumin Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

difin left a comment

Choose a reason for hiding this comment

Uh oh!

aturoczy commented Sep 4, 2025

Uh oh!

Uh oh!

Uh oh!