Flink Merge On Read Behavior? Equality & Positional Deletes #11535

FranMorilloAWS · 2024-11-13T11:40:49Z

Query engine

Apache Flink

Question

Can somebody explain how Delete files are implemented with Apache Flink? Spark only makes use of Positional Deletes, but Apache Flink seems that we are using both? Not sure why Flink would need to use both?

pvary · 2024-11-13T11:59:42Z

The discussion here might help you: #10935 (comment)

FranMorilloAWS · 2024-11-14T13:02:39Z

Hi @pvary , but still is not clear in which scenarios does flink decide to either do a positional delete or an equality delete when commiting the snapshot. Also i have seen snapshots commits that may have both. There is an Alibaba blog post that mentions this is due to avoid inconsistencies but again not clear and not documented anywhere: https://www.alibabacloud.com/blog/how-to-analyze-cdc-data-in-iceberg-data-lake-using-flink_597838

pvary · 2024-11-14T14:22:46Z

Equality delete:

Written if the ID first deleted during a checkpoint

Positional delete:

A record is inserted with a given ID, and then it is deleted during the same checkpoint

FranMorilloAWS · 2024-11-14T15:11:45Z

Why we need to use both? Is there an example scenario we can go over? Thanks in advanced for answering me :)

pvary · 2024-11-14T16:20:44Z

Imagine a scenario where a specific Id is updated twice. Equality based delete is not enough in this case to remove the outdated first record and keep the second record.
Positional delete is not enough in itself, since we need to find the data file and the specific rownum to delete the record. In edge cases this would require us to do a full table scan for every record...

FranMorilloAWS added the question Further information is requested label Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flink Merge On Read Behavior? Equality & Positional Deletes #11535

Flink Merge On Read Behavior? Equality & Positional Deletes #11535

FranMorilloAWS commented Nov 13, 2024

pvary commented Nov 13, 2024

FranMorilloAWS commented Nov 14, 2024

pvary commented Nov 14, 2024

FranMorilloAWS commented Nov 14, 2024

pvary commented Nov 14, 2024

Flink Merge On Read Behavior? Equality & Positional Deletes #11535

Flink Merge On Read Behavior? Equality & Positional Deletes #11535

Comments

FranMorilloAWS commented Nov 13, 2024

Query engine

Question

pvary commented Nov 13, 2024

FranMorilloAWS commented Nov 14, 2024

pvary commented Nov 14, 2024

FranMorilloAWS commented Nov 14, 2024

pvary commented Nov 14, 2024