Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(state-snapshots): handle multiple incoming requests correctly #12912

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

marcelo-gonzalez
Copy link
Contributor

The state snapshot actor's logic does not work if several requests come in quick succession. There are a couple problems that this PR fixes.

One is that we first handle a DeleteAndMaybeCreateSnapshotRequest message and delete the state snapshot, and then send a CreateSnapshotRequest to Self that will be handled later. But this means that if several requests come in a period of time shorter than the amount of time it takes to create the state snapshot, several deletions might all run in a row before the corresponding creations run, and the extra state snapshots will not get cleaned up

The second more important problem is that we call snapshot_taken() to allow flat storage updates after the snapshot has been created, but this is done even if there have been other snapshot requests in the meantime. So we end up allowing flat head updates when we shouldn't have, and the second snapshot request will take a snapshot of flat storage that's advanced too far in the chain

This was seen recently because it was triggered by the extra snapshot request we generate before #12907, but it has always been possible in theory in the case of a fork. So even with that PR, this is still worth fixing

With the current code, we handle a DeleteAndMaybeCreateSnapshotRequest message
and delete the state snapshot, and then send a CreateSnapshotRequest that will
be handled later. But this means that if several requests come in a period of time
shorter than the amount of time it takes to create the state snapshot, several deletions
might all run in a row before the corresponding creations run, and the extra state snapshots
will not get cleaned up (as warned about in the log line "Requested a state snapshot but that is already available with a different hash".

There's no real need to have this extra message passing to Self since along with this bug,
it adds unnecessary complication to the code. So just have two message types, one for deletions and one for creations, and call
self.tries.delete_state_snapshot() in the handler for snapshot creations.
@marcelo-gonzalez marcelo-gonzalez requested a review from a team as a code owner February 12, 2025 03:26
@marcelo-gonzalez
Copy link
Contributor Author

I didnt add any testloop tests for this case, because it's a bit tricky to get the timing to trigger it. But I tested it w the hacks in this branch and also ran 2.5.0 with this PR but not #12907 cherry picked, so we can see the fix here get triggered

Copy link

codecov bot commented Feb 12, 2025

Codecov Report

Attention: Patch coverage is 70.96774% with 9 lines in your changes missing coverage. Please review.

Project coverage is 70.52%. Comparing base (b411263) to head (d333938).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
chain/chain/src/state_snapshot_actor.rs 56.25% 7 Missing ⚠️
core/store/src/flat/manager.rs 85.71% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #12912      +/-   ##
==========================================
- Coverage   70.54%   70.52%   -0.02%     
==========================================
  Files         851      851              
  Lines      174906   174917      +11     
  Branches   174906   174917      +11     
==========================================
- Hits       123382   123355      -27     
- Misses      46399    46432      +33     
- Partials     5125     5130       +5     
Flag Coverage Δ
backward-compatibility 0.36% <0.00%> (+<0.01%) ⬆️
db-migration 0.36% <0.00%> (+<0.01%) ⬆️
genesis-check 1.42% <0.00%> (+<0.01%) ⬆️
linux 70.38% <67.74%> (-0.02%) ⬇️
linux-nightly 70.17% <70.96%> (-0.02%) ⬇️
pytests 1.73% <0.00%> (+<0.01%) ⬆️
sanity-checks 1.54% <0.00%> (+<0.01%) ⬆️
unittests 70.35% <70.96%> (-0.02%) ⬇️
upgradability 0.36% <0.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant