-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Update last row key in write function of user stream to avoid data loss and data duplication #1459
Open
danieljbruce
wants to merge
16
commits into
googleapis:main
Choose a base branch
from
danieljbruce:data-duplication-long-term-fix
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
fix: Update last row key in write function of user stream to avoid data loss and data duplication #1459
danieljbruce
wants to merge
16
commits into
googleapis:main
from
danieljbruce:data-duplication-long-term-fix
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This allows us to override the write method
product-auto-label
bot
added
size: m
Pull request size is medium.
api: bigtable
Issues related to the googleapis/nodejs-bigtable API.
labels
Jul 26, 2024
danieljbruce
added
the
kokoro:force-run
Add this label to force Kokoro to re-run the tests.
label
Jul 29, 2024
yoshi-kokoro
removed
the
kokoro:force-run
Add this label to force Kokoro to re-run the tests.
label
Jul 29, 2024
danieljbruce
added
the
owlbot:run
Add this label to trigger the Owlbot post processor.
label
Jul 29, 2024
gcf-owl-bot
bot
removed
the
owlbot:run
Add this label to trigger the Owlbot post processor.
label
Jul 29, 2024
danieljbruce
changed the title
fix: TODO
fix: Update last row key in write function of user stream to avoid data loss and data duplication
Jul 29, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
api: bigtable
Issues related to the googleapis/nodejs-bigtable API.
size: m
Pull request size is medium.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
This PR offers a long term fix ensuring that when retryable errors occur that the user will not experience data loss nor will duplicate data be delivered to them. It replaces the patch that is currently in place that is preventing data duplication.
Background
In this PR a fix was applied to prevent data loss in the client for readRows calls. Data duplication was also mostly addressed by adjusting watermarks except for a special case that occurred from time to time in Node v14. The patch in this PR was then applied to ensure the user didn't receive duplicate data by throwing away duplicate data just before it reached the user. Throwing away the duplicate data solves the problem, but it isn't the best permanent solution because it involves two changes to solve just one fundamental problem. This current PR replaces the patch with a simpler change that will also solve the data duplication and data loss issues.
Root cause analysis
While investigating the issue it was found that duplicated data was delivered when a row made it to the write function of the user stream, but not to the transform function of the user stream when the retry request was being formed. It was being stored here in the stream waiting to be processed by transform. Since the transform function was where the last row key was being updated and this hadn't happened yet for the row, the row was re-requested. By pushing the row key update back to the write function we ensure the row doesn't get re-requested.
Changes
Alternatives considered
Instead of overriding write we could pass in a write function to the constructor, but we don't want to do that because it will override _write and stop calling
_transform
if the Transform is ready for reading. Other options in theTransform
constructor don't allow us to achieve our goal of updating the last row key earlier. Therefore, overriding write is our best option.