Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(datastore): Add syncExpression field to LastSyncMetadata #2936

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

edisooon
Copy link
Member

  • PR title and description conform to Pull Request guidelines.

Issue #, if available:

Description of changes:
This is the first out of two PRs in an effort to address a current issue customers had reported in our Datastore category.

The issue is that the "changing sync expression in Runtime" doesn't work as expected as described in our doc.

For example, when we initialize the DataStorePlugin with a sync expression for Student model to only sync down all the students who are >= 20-year-old, even if we change this sync expression to sync down students who are >= 17-year-old in run-time followed by Amplify.DataStore.stop({},{}) and Amplify.DataStore.start({}, {}): the new synced expression will only be applied to the newly-created/updated students, i.e. existing instances of Students whose ages are in between 17 and 20 in remote database won't get synced, which is not as expected.

The bug is originated from the hydrate() method in SyncProcessor.java, which is being called by the startApiSync() method in Orchestrator.java, a component responsible for "Synchronizing changed data between the LocalStorageAdapter and AppSync".
To build a Sync Request (syncModel(..) method in SyncProcessor.java), we need to pass in

  • a model schema of the Model we want to sync
  • last_sync_timestamp of this model
  • predicate / sync expression

In order to achieve this, our implementation persists the last_sync_timestamp of models in LastSyncMetadata table whenever the code initiate a sync request, and uses the persisted last_sync_timestamp from LastSyncMetadata to initiate the next sync. (createHydrationTasks(..) in SyncProcessor.java)

This would allow us to initiate either a Delta sync or Base sync based on the lastSyncTime parameter we pass in, which is defined by the nature of AppSync's Sync API.

To further explain the cause of this bug, let's keep using the previous example:
After we initiate the first sync with sync expression (age>=20), we would add a new row in LastSyncMetadata, which might look like [Student(model name), 3(last sync timestamp)], assuming the current timestamp is 3.
After we change the sync expression in runtime (age>=17), the implementation would:

  • retrieve the Student model's last sync time (3) from the database
  • use 3 as last sync time and the updated sync expression (age>=17) to build a new sync request for Student model
  • update the row in LastSyncMetadata table, which might look like [Student(model name), 4(last sync timestamp)]

This will lead to the bug behavior described above, because:
AppSync will use BOTH updated_sync_expression(age>=17) and last_sync_timestamp(3) to initiate a delta sync (_base sync will be performed only when last_sync_timestamp is 0). And for delta sync, under the hood, this last_sync_timestamp will be used to compare with the metadata _lastChangedAt in each rows of Student table in remote database.

So previous students who haven't been updated since last_sync_timestamp(3), i.e. _lastChangedAt<3, won't get synced down, even though they meet the updated sync expression, e.g. a row in Student table like [student1(name), 18(age), 2(_lastChangedAt)].

But students who are added/updated later will get synced, e.g., a row in Student table like [student2(name), 17(age), 5(_lastChangedAt)].

The essence of this problem is that: the last_sync_timestamp was only associated with the model, but should be associated with the model and the last_sync_expression being used.

To address this, we need to:

  • add a new column in LastSyncMetadata to store the last_sync_expression being used
  • store the last_sync_expression after we request a sync, and use last_sync_expression to adjust the retrieved last_sync_timestamp when we try to build a new sync request. To be specific, if the current_sync_expression != last_sync_expression, we return 0 as adjusted last_sync_timestamp to initiate a full sync.

The solution has been broke down into two steps, which would be implemented in two PRs:

  1. DB migration for LastSyncMetadata table
  2. logic changes to make use of the new field

How did you test these changes?
(Please add a line here how the changes were tested)
[TODO]

Documentation update required?

  • No
  • Yes (Please include a PR link for the documentation update)

General Checklist

  • Added Unit Tests
  • Added Integration Tests
  • Security oriented best practices and standards are followed (e.g. using input sanitization, principle of least privilege, etc)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@edisooon edisooon requested a review from a team as a code owner October 10, 2024 17:36
@edisooon edisooon marked this pull request as draft October 10, 2024 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant