This repository has been archived by the owner on Feb 5, 2025. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 4
Fetch leaves recursively by hash, verifying chaining #726
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
c680b81
to
b0fd395
Compare
3820561
to
924cefa
Compare
2d6e36f
to
6bdab3b
Compare
This changes leaf fetching to require already having the _next_ leaf. This tells us what the hash should be for the leaf being fetched. This addresses two unrelated issues: 1. Leaf fetching is trusted: we were not previously verifying that the leaf returned by a peer is valid in any way. Now, we can verify the fetched leaf against the expected hash (and similarly for the fetched QC). We exploit the chaining property of HotShot leaves to avoid having to run any kind of consensus light client to verify fetched leaves. 2. Fetching leaves by hash instead of (or in addition to) by height allows us to implement more providers. For example, we can now implement a provider that pulls leaves from undecided consensus storage, by hash, which allows us to fetch a leaf from our own storage even if we missed the corresponding decide event. Knock-on changes: * Proactive scanning now runs backwards, since leaf fetching becomes kind of inherently backwards. This was a change we wanted anyways (closes #620) * To facilitate that, we have added reverse streams for all the resource types, which may come in handy for the explorer * Receiving a leaf (either by fetching or decide event) now triggers us to fetch the parent if it is missing. This is supplements the proactive fetcher and can often help us obtain missing data really quickly (e.g. if we just missed one decide) * `NoStorage` is gone. The purpose of `NoStorage` was to stress test fetching, which it did in the beginning, but it has been ages since we found a real bug this way; we now have plenty of other adversarial fetching tests, and `NoStorage` is becoming more trouble than it's worth to maintain. In particular, effective fetching now depends on having somewhat reliable storage (a reasonable assumption!) because we assume if you fetch a leaf, you can look it up shortly thereafter and thus fetch its parent. Thus, the `NoStorage` tests were failing or very slow because of many failed requests, with this change.
If we do not have a leaf and also do not have the next leaf, so that we cannot directly fetch the required leaf, at least we can find the first leaf that we do have and trigger a fetch, which in turn will trigger a fetch of its parent, and so on. This brings the behavior of leaf fetching closer to before, where if a leaf is requested and we don't have it, we will immediately start working to get it.
Before switching to nextest, we only ran 2 tests at a time. This prevented us from seeing any weirdness due to conflicting ports and the like. Now, I am seeing some flakiness in tests that use Postgres, which I believe is due to the increased concurrency allowed by nextest. This change puts all the tests using postgres in a group, and restricts that group to 2 tests at a time. Other tests can run with unrestricted concurrency.
* Ensure proper return code from PayloadMetadata when missing from DB * No more retries for upsert; they accomplish nothing as these errors are almost always unrecoverable without restarting the whole transaction, but they cost a ton of time * Avoid inserting stale data when pruner is racing with fetcher
4a2c7ab
to
de8a66e
Compare
* Avoid long delay in `test_archive_recovery` due to competing write transactions, by tweaking backoff parameters * In aggregates test, wait for aggregator task to update statistics before checking * Don't fetch parent leaves that have already been pruned
imabdulbasit
approved these changes
Dec 5, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good! I like how we are using parent_commitment()
to verify the fetched leaf.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This changes leaf fetching to require already having the next leaf. This tells us what the hash should be for the leaf being fetched.
This addresses two unrelated issues:
This PR:
NoStorage
is gone. The purpose ofNoStorage
was to stress test fetching, which it did in the beginning, but it has been ages since we found a real bug this way; we now have plenty of other adversarial fetching tests, andNoStorage
is becoming more trouble than it's worth to maintain. In particular, effective fetching now depends on having somewhat reliable storage (a reasonable assumption!) because we assume if you fetch a leaf, you can look it up shortly thereafter and thus fetch its parent. Thus, theNoStorage
tests were failing or very slow because of many failed requests, with this change.Key places to review:
active_fetch
insrc/data_source/fetching/leaf.rs
trigger_fetch_for_parent
insrc/data_source/fetching/leaf.rs
src/data_source/fetching.rs