You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is not ideal to process the same records multiple times, since it may keep replacing statements.
When we are consuming from S3, we only transform each file once, and when from a Kinesis stream, we keep track of our stream pointer, so this doesn’t happen much in practice. However, when switching from bulk files over to the Kinesis stream, there is a danger of 48 hours of records or so being processed more than once.
To fix this, it would make sense to keep track of the records transformed in the previous 48 hours, so these can be safely skipped.
When a record has been transformed, store the etag of the processed PSC record for some length of time longer than max stream duration (eg store for 48 hours)
When transforming a PSC record, first check whether it has been transformed in the last 48 hours.
This will ensure that the same records don’t get processed multiple times in cases of duplicates or during the changeover.
Estimate: 6 hours
The text was updated successfully, but these errors were encountered:
It is not ideal to process the same records multiple times, since it may keep replacing statements.
When we are consuming from S3, we only transform each file once, and when from a Kinesis stream, we keep track of our stream pointer, so this doesn’t happen much in practice. However, when switching from bulk files over to the Kinesis stream, there is a danger of 48 hours of records or so being processed more than once.
To fix this, it would make sense to keep track of the records transformed in the previous 48 hours, so these can be safely skipped.
This will ensure that the same records don’t get processed multiple times in cases of duplicates or during the changeover.
Estimate: 6 hours
The text was updated successfully, but these errors were encountered: