Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(datastore): base sync when sync expression changes #2937

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

edisooon
Copy link
Member

@edisooon edisooon commented Oct 10, 2024

  • PR title and description conform to Pull Request guidelines.

Issue #, if available:

Description of changes:
This is in an effort to address a current issue customers had reported in our Datastore category.

The issue is that the "changing sync expression in Runtime" doesn't work as expected as described in our doc.

For example, when we initialize the DataStorePlugin with a sync expression for Student model to only sync down all the students who are >= 20-year-old, even if we change this sync expression to sync down students who are >= 17-year-old in run-time followed by Amplify.DataStore.stop({},{}) and Amplify.DataStore.start({}, {}): the new synced expression will only be applied to the newly-created/updated students, i.e. pre-existing instances of Students whose ages are in between 17 and 20 in remote database won't get synced, which is not as expected.

The bug is originated from the hydrate() method in SyncProcessor.java, which is being called by the startApiSync() method in Orchestrator.java, a component responsible for "Synchronizing changed data between the LocalStorageAdapter and AppSync".
To build a Sync Request (syncModel(..) method in SyncProcessor.java), we need to pass in

  • a model schema of the Model we want to sync
  • last_sync_timestamp of this model
  • predicate / sync expression

In order to achieve this, our implementation persists the last_sync_timestamp of models in LastSyncMetadata table whenever the code initiate a sync request, and uses the persisted last_sync_timestamp from LastSyncMetadata to initiate the next sync. (createHydrationTasks(..) in SyncProcessor.java)

This would allow us to initiate either a Delta sync or Base sync based on the lastSyncTime parameter we pass in, which is defined by the nature of AppSync's Sync API.

To further explain the cause of this bug, let's keep using the previous example:
After we initiate the first sync with sync expression (age>=20), we would add a new row in LastSyncMetadata, which might look like [Student(model name), 3(last sync timestamp)], assuming the current timestamp is 3.
After we change the sync expression in runtime (age>=17), the implementation would:

  • retrieve the Student model's last sync time (3) from the database
  • use 3 as last sync time and the updated sync expression (age>=17) to build a new sync request for Student model
  • update the row in LastSyncMetadata table, which might look like [Student(model name), 4(last sync timestamp)]

This will lead to the bug behavior described above, because:
AppSync will use BOTH updated_sync_expression(age>=17) and last_sync_timestamp(3) to initiate a delta sync (base sync will be performed only when last_sync_timestamp is 0). And for delta sync, under the hood, this last_sync_timestamp will be used to compare with the metadata _lastChangedAt in each rows of Student table in remote database.

So previous students who haven't been updated since last_sync_timestamp(3), i.e. _lastChangedAt<3, won't get synced down, even though they meet the updated sync expression, e.g. a row in remote Student table like [student1(name), 18(age), 2(_lastChangedAt)].

But students who are added/updated later will get synced, e.g., a row in remote Student table like [student2(name), 17(age), 5(_lastChangedAt)].

The essence of this problem is that: the last_sync_timestamp was only associated with the model, but should be associated with the model and the last_sync_expression being used.

To address this, we need to:

  • add a new column in LastSyncMetadata to store the last_sync_expression being used
  • store the last_sync_expression after we request a sync, and use last_sync_expression to adjust the retrieved last_sync_timestamp when we try to build a new sync request. To be specific, if the current_sync_expression != last_sync_expression, we return 0 as adjusted last_sync_timestamp to initiate a full sync.

The solution has been broke down into three steps:

  1. DB migration for LastSyncMetadata table
  2. add new QueryPredicate field in LastSyncMetadata model [There are some changes related to (de)serialization of QueryPredicate in GsonPredicateAdapater]
  3. logic changes in SyncProcessor & SyncTimeRegistry to make use of the new field

How did you test these changes?
(Please add a line here how the changes were tested)
manual testing, new and existing test cases in SyncProcessorTest

Documentation update required?

  • No
  • Yes (Please include a PR link for the documentation update)

General Checklist

  • Added Unit Tests
  • Added Integration Tests
  • Security oriented best practices and standards are followed (e.g. using input sanitization, principle of least privilege, etc)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@edisooon edisooon requested a review from a team as a code owner October 10, 2024 17:42
@edisooon edisooon marked this pull request as draft October 10, 2024 18:11
@edisooon edisooon changed the base branch from edisooon/db-migration-for-syncexpression-field-in-LastSyncMetadata to main October 15, 2024 23:49
@edisooon edisooon changed the title fix(datastore): Store and use syncExpression field in LastSyncMetadata table to retrieve adjusted lastSyncTime fix(datastore): base sync when sync expression changes Oct 16, 2024
@edisooon edisooon marked this pull request as ready for review October 16, 2024 22:45
/**
* Add SyncExpression (TEXT) column to LastSyncMetadata table.
*/
public final class AddSyncExpressionToLastSyncMetadata implements ModelMigration {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not be public

/**
* Add SyncExpression (TEXT) column to LastSyncMetadata table.
*/
public final class AddSyncExpressionToLastSyncMetadata implements ModelMigration {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's write any added files in Kotlin please

* @return Current builder
*/
@NonNull
public Builder syncExpressions(@NonNull Map<String, DataStoreSyncExpression> syncExpressions) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API change hasn't been reviewed, please remove

return recordNum == 0; // needs to be upgraded if there's no column named ${newSyncExpColumnName}
}
}
return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this default be true?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants