Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making MERGE more robust in handling edge-cases #222

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

bony2023
Copy link

@bony2023 bony2023 commented Sep 15, 2024

Closes: goccy/bigquery-emulator#163

The MERGE statement currently works fine but on a few edge cases:

  1. When the INSERT action inside MERGE uses ROW instead of specifying a column list, it raises the error. We can simply add the feature FeatureV13OmitInsertColumnList to fix that.
/* ❌ Doesn't work currently because there's no column values given in INSERT */
MERGE target T USING source S ON T.id = S.id
WHEN NOT MATCHED THEN INSERT ROW;
  1. When the source table is evaluated using a SELECT expression rather than just a name, it raises an error. This is fixed by adding additional parenthesis around the table joins.
/* ❌ Doesn't work currently as source table is evaluated using a SELECT */
MERGE target T USING (SELECT * FROM source) S ON T.id = S.id
WHEN NOT MATCHED THEN INSERT (id, name) VALUES (id, name);
  1. When the target table name is a substring of source table name, it doesn't properly merge as it switches the target and source tables, because of this check.
/* ❌ Doesn't work currently as target table name is a substring of source table name */
MERGE target T USING target_tmp S ON T.id = S.id
WHEN NOT MATCHED THEN INSERT (id, name) VALUES (id, name);

This PR aims to fix all these issues while adding some test cases for basic merge statements.

@bony2023 bony2023 changed the title Make MERGE more robust on handling edge-cases Make MERGE more robust in handling edge-cases Sep 15, 2024
@codecov-commenter
Copy link

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 50.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 48.85%. Comparing base (b6dd6a5) to head (eccefa8).
Report is 3 commits behind head on main.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #222      +/-   ##
==========================================
+ Coverage   48.54%   48.85%   +0.31%     
==========================================
  Files          47       47              
  Lines       17906    17920      +14     
==========================================
+ Hits         8692     8755      +63     
+ Misses       7763     7714      -49     
  Partials     1451     1451              

@bony2023
Copy link
Author

bony2023 commented Sep 15, 2024

Hi @ohaibbq @goccy,

How does this change look to you? I am not able to add reviewers so directly commented tagging you both.

@n10l
Copy link

n10l commented Sep 16, 2024

@bony2023 Still doesn't work for me until I specify individual column name in the MERGE statement. E.g. doesn't work when statement ends with .... WHEN NOT MATCHED THEN INSERT ROW like in raised issue

@bony2023
Copy link
Author

@n10l Can you paste the error, you are seeing? Just FYI, there are multiple issues with the MERGE currently. I also created this PR to fix sub-actions on merge: #223

Can you try using both the PRs to test?

@n10l
Copy link

n10l commented Sep 16, 2024

Hi @bony2023 I am seeing Exception: ApiError: failed to analyze: INVALID_ARGUMENT: Missing insert column list [at 1:180].
My query is MERGE <table 1> t1 USING <table 2> t2 ON t1.created_on = t2.created_on WHEN NOT MATCHED THEN INSERT ROW
It may work when I specify individual column names in INSERT instead of generic ROW keyword just like in original issue.

Can you please try to support WHEN NOT MATCHED THEN INSERT ROW in your test case instead of WHEN NOT MATCHED THEN INSERT (id, name) VALUES (id, name) just like in error log of goccy/bigquery-emulator#163

Thanks for the good work anyway !!

Update: I tested by building your PR locally and its still same issue for me.

@bony2023
Copy link
Author

@n10l It was missing a feature flag FEATURE_V_1_3_OMIT_INSERT_COLUMN_LIST. Added another commit which should make statements work which have column list omitted. Check these tests for additional info.

@n10l
Copy link

n10l commented Sep 16, 2024

Thanks @bony2023 I will let you know shortly.

@n10l
Copy link

n10l commented Sep 16, 2024

@bony2023 Seems like that error is gone. Thank you for fixing the original issue. But, in my case I am running query using nodejs bigquery client with query MERGE <project>.<dataset>.<table 1> t1 USING <project>.<dataset>.<table 2> t2 ON t1.created_on = t2.created_on WHEN NOT MATCHED THEN INSERT ROW and the MERGE is translating into just table name without considering project and dataset, so I am getting error as table not found with emulator. Code works with real BQ.

Fully qualified name for table should be like project_dataset_tablename instead of just tablename which works for all other commands except MERGE.

@bony2023
Copy link
Author

@n10l Thanks for verifying it works on your end. For your second issue, this is what the other PR should be fixing.

@n10l
Copy link

n10l commented Sep 16, 2024

Very well @bony2023 testing that as well.

@bony2023 bony2023 changed the title Make MERGE more robust in handling edge-cases Making MERGE more robust in handling edge-cases Sep 16, 2024
@bony2023
Copy link
Author

bony2023 commented Oct 16, 2024

@goccy 👋 Pinging again to check if this is something we can merge along with #223

@nealrushforthlbg
Copy link

nealrushforthlbg commented Dec 4, 2024

@bony2023 This doesn't work when you have the
USING as a CTE e.g.

merge into project.dataset.stg-data as
DBT_INTERNAL_DEST
using (
with staging as ( select
Id as ID, IsoCode
from project.dataset.raw-data raw
WHERE raw.IsoCode IS NOT NULL
)
select * from staging
) as DBT_INTERNAL_SOURCE
on (
DBT_INTERNAL_SOURCE.IsoCode =
DBT_INTERNAL_DEST.IsoCode
)
when matched then update set
Id = DBT_INTERNAL_SOURCE.Id,IsoCode =
DBT_INTERNAL_SOURCE.IsoCode
when not matched then insert
(Id, IsoCode)
Values
(Id, IsoCode)

Any chance you can look into this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

INSERT ROW part of MERGE statement dont seem to be supported
4 participants