Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADR about desktop behaviour on sync failure #220

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

leplatrem
Copy link
Contributor

Use this ADR to discuss the solution chosen to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1712108

@leplatrem leplatrem force-pushed the document-desktop-sync-error-adr branch from a659639 to 95bdfcc Compare May 3, 2022 13:28
@leplatrem
Copy link
Contributor Author

leplatrem commented May 6, 2022

I'm slowly leaning to option 3. Which could be a valid segway until something option 5 is implemented.

@grahamalama what's your gut feeling?

@grahamalama
Copy link
Contributor

grahamalama commented May 6, 2022

@leplatrem I agree with option 3 as a segway, though we'll probably want to emit warnings first before we switch over fully to raising exceptions, yeah? Or somehow roll out the fact that "hey users of RS client, this change is coming -- plan accordingly."

@mostlygeek
Copy link
Contributor

mostlygeek commented May 6, 2022

I like option 3 as well. Here's my understanding of it:

  1. we're changing the default behaviour of .get() to always return the local data. Signature valid or not
  2. for collections where tamper detection (signature validation) is required, it can be made explicit in code

What's not clear to me is, what do we do in Option 3, to resync the data? Does that behaviour stay the same as it is right now?

Also, I'm assuming that remote-settings expects that eventually the local collection will be fixed and .get() will stop throwing the exception.

@leplatrem
Copy link
Contributor Author

though we'll probably want to emit warnings first before we switch over fully to raising exceptions, yeah?

Of course! I was thinking a using a new flag, disabled by default, something like throwIfInvalidSignature, or throwIfSyncFailed, or ...?

we're changing the default behaviour of .get() to always return the local data. Signature valid or not

Not really, I would suggest that we avoid changing the default behavior for now.
We already have a .get() option to enable signature verification on read, but this new flag would be slightly different. I realize with your comment, that naming has to be finely chosen.

.get({verifySignature: true}) will verify the signature of local data on each read, and will throw if invalid.

The new one proposed in Option 3 won't verify the signature on each read, but will throw if the last synchronization failed because of signatures (status=sign_retry_error).

The objective is really to remove the current confusion for consumers between an empty list and a failed synchronization because of signatures.

for collections where tamper detection (signature validation) is required, it can be made explicit in code

Yes. One possibility is to:

  1. ship this option in the dark, documented but not enabled
  2. enable it consumer per consumer, Normandy and Nimbus as a start
  3. identify remaining use-cases (consumers using .get() without packaged dump) and get in touch with authors
  4. enable by default some day :)

What's not clear to me is, what do we do in Option 3, to resync the data? Does that behaviour stay the same as it is right now?

Same as right now. Each time we poll for changes we retry to synchronize the collections that are behind.

Also, I'm assuming that remote-settings expects that eventually the local collection will be fixed and .get() will stop throwing the exception.

Yes

I will eventually update the docs with these clarifications

@mostlygeek
Copy link
Contributor

mostlygeek commented May 10, 2022

Not really, I would suggest that we avoid changing the default behavior for now.

A lot of this came from "an invalid cert empties remote-settings globally". This is too much risk tied to a single piece being 100% perfect, 100% of the time. Also, that remote-settings is so fast, we have very limited ability to mitigate or contain the impact.

Option 4 seems to be more attractive. If I understand correctly:

  • we'll attempt a sync. We verify signatures on the new state before writing to indexdb in a transaction.
    • If any part of the sync fails, we leave the local data as it. It may be stale but at least it's whole.
    • It may also be tampered with, we don't check that.
  • for collections that explicitly require tamper protection, we already have .get({verifySignature: true}).
    • a benefit of this is we don't have to add a new client side API option, keeping the default behaviour as is
    • customers who require and use this should expect that .get(...) can return a [] and have logic to address that empty collection

@leplatrem
Copy link
Contributor Author

Option 4 is definitely the easiest to implement, and would even simplify our code base.

@leplatrem
Copy link
Contributor Author

A lot of this came from "an invalid cert empties remote-settings globally". This is too much risk tied to a single piece being 100% perfect, 100% of the time.

This is not entirely true. All collections which package JSON data sets will be restored to the defaults.

@mostlygeek
Copy link
Contributor

mostlygeek commented May 16, 2022

Making sure I'm understanding where we would wind up after Option 4:

  • fetch, signature verification, etc all happen in memory. When everything is as expected, it is written to indexdb as an atomic transaction
  • there is a risk of data on disk being tampered with. However, for collections that care about that we have the verifySignature:true option in .get().
  • .get() will only return [] due to failed verifySignature:true, or an actual empty collection on disk

If we implement option4, what we change is:

  • .get() returns whatever we have on disk
  • .get({verifySignature:true}) returns only verified data, [] otherwise

All synchronization details are remain hidden from customers. Which I think is a good thing.

@grahamalama
Copy link
Contributor

grahamalama commented May 17, 2022

Something I thought of last night: veryifySignature should be opt-out, and we should name the setting something scary like dangerouslyIgnoreSignatureVerification. I'd say we want to scare users into thinking "are you sure your collection data can't be leveraged for anything nefarious?".

@leplatrem
Copy link
Contributor Author

.get() will only return [] due to failed verifySignature:true, or an actual empty collection on disk

.get() will throw an error if verifySignature flag is set

If we implement option4, what we change is:

I realize that it changes the security model. It becomes equivalent to option 2). Updates can be disabled by serving bad signatures, and tempered data will remain.

veryifySignature should be opt-out, and we should name the setting something scary like dangerouslyIgnoreSignatureVerification.

I agree, but that would be an impactful change, calling .get() would now become more expensive, and potentially calling network if the collection has no metadata in the local DB.


I now came back to thinking that the most reasonable option is 3), since it only requires changing a couple of critical use-cases first. Will iterate with security-team!

@mostlygeek
Copy link
Contributor

Option 3 does sound like a very pragmatic approach after we've had this discussion.

@grahamalama
Copy link
Contributor

grahamalama commented May 18, 2022

I do think, even with option 3, that signature verification should be opt out rather that opt in. I wonder what the security team thinks about this 🤔.

Maybe we make the setting ignoreSignatureFailure and initially we set true as the default, but then flip it to false down the line, emitting lots of warnings before we do so.

@leplatrem
Copy link
Contributor Author

leplatrem commented May 19, 2022

I do think, even with option 3, that signature verification should be opt out rather that opt in

I agree that it would be ideal, but unfortunately we cannot make such an impactful change. The alternative we found is to verify signatures when data is up-to-date. See https://bugzilla.mozilla.org/show_bug.cgi?id=1640126

@mostlygeek
Copy link
Contributor

I do think, even with option 3, that signature verification should be opt out rather that opt in. I wonder what the security team thinks about this 🤔.

When Remote Settings was younger, the default was definitely opt-out. Now that we have dozens more use cases since then, I think opt-in and the trade offs it comes with, is a more pragmatic choice.

@leplatrem
Copy link
Contributor Author

When Remote Settings was younger, the default was definitely opt-out. Now that we have dozens more use cases since then, I think opt-in and the trade offs it comes with, is a more pragmatic choice.

I'm not sure we're talking about the same thing.

Remote Settings always had signature verification during synchronization. I had the impression that Graham was referring to the signature verification on read (when calling .get())

@mostlygeek
Copy link
Contributor

I'm not sure we're talking about the same thing.

Maybe. Which came first, main bucket or content signatures? I don't remember. 😓

@leplatrem
Copy link
Contributor Author

Maybe. Which came first, main bucket or content signatures? I don't remember. 😓

The content signatures came first. But before the main bucket, we were creating a new bucket and signer for each new use-case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants