Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect, report, and isolate invalid PCD states #1341

Closed
artwyman opened this issue Dec 7, 2023 · 0 comments · Fixed by #1348
Closed

Detect, report, and isolate invalid PCD states #1341

artwyman opened this issue Dec 7, 2023 · 0 comments · Fixed by #1348
Assignees

Comments

@artwyman
Copy link
Member

artwyman commented Dec 7, 2023

This issue is split out of #1271.

Issue: We've seen cases where a user's local state gets corrupted such that data is lost. Examples:

  • A user's self or identity don't match their PCDs. We believe this has happened as a result of multiple tabs logged in to different accounts on the same machine.
  • A user's PCDCollection is cleared. This happened to me before Devconnect, likely as a result of Chrome restarting to update, but I was never able to fully diagnose or fix it.

Importantly, we don't know exactly what can still cause these situations (or we would've fixed it), but expect such bugs to still be possible. When these bugs occur, the result can become "sticky" when the bad data is uploaded to E2EE storage, resulting in multiple devices being in the same bad state. Given our lack of backups/revision-history this also makes recovery difficult.

Proposal: Since we haven't diagnosed all root causes, our goal should be to minimize impact of bugs which do happen, and aid in diagnosing them when they do. Specifically we should:

  • Check a user's data at key points, to see whether appropriate invariants are maintained.
  • When an invariant is broken, report an event to the server with useful diagnostic info.
  • Stop progress of operations which will propagate invalid data into more places than it already appears.
  • Inform the user that something went wrong, and ideally give them options to address it.

The rest of the text below is a capturing of implementation options and notes, still subject to change:

Validation points: These are points where state flows in/out of the system, where it can be validated and blocked from propagating.

  • Initial load from local storage on page load
  • finishAccountCreation()
  • loadAfterLogin()
  • Also any other loading from local storage (which are worth minimizing)
  • Saving updated state to local storage
  • Upload to E2EE storage
  • Download from E2EE storage

It's worth considering combining multiple local storage fields into one in order to simplify these validation points, and reduce the risk of fields being out of sync with each other.

Reporting to the user: Options here:

  • Do nothing, it's just an error handled as-is.
  • Pop up a modal to inform the user and give them remediation actions.

Remediation actions: What can the user do about it?

  • Nothing but the manual options they already have: reload page, clear their profile
  • Button to reload
  • Button to log off (which clears local storage)
  • Button to export PCDs for later import/recovery
  • Button to import a previous backup

What are the invariants?:

For a logged-in user:

  • self, identity, encryption_key, and pcd_collection
    • Maybe also check sync_status and subscriptions, though those aren't tied to invariants
  • PCDCollection is nonempty
  • Identity PCD is present (and findable in the standard way)
  • self, identity, and identity PCD all have the same commitment

Otherwise the user is logged out, and all of the above fields should be absent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment