-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flink Merge On Read Behavior? Equality & Positional Deletes #11535
Comments
The discussion here might help you: #10935 (comment) |
Hi @pvary , but still is not clear in which scenarios does flink decide to either do a positional delete or an equality delete when commiting the snapshot. Also i have seen snapshots commits that may have both. There is an Alibaba blog post that mentions this is due to avoid inconsistencies but again not clear and not documented anywhere: https://www.alibabacloud.com/blog/how-to-analyze-cdc-data-in-iceberg-data-lake-using-flink_597838 |
Equality delete:
Positional delete:
|
Why we need to use both? Is there an example scenario we can go over? Thanks in advanced for answering me :) |
Imagine a scenario where a specific Id is updated twice. Equality based delete is not enough in this case to remove the outdated first record and keep the second record. |
Query engine
Apache Flink
Question
Can somebody explain how Delete files are implemented with Apache Flink? Spark only makes use of Positional Deletes, but Apache Flink seems that we are using both? Not sure why Flink would need to use both?
The text was updated successfully, but these errors were encountered: