-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change how snapshot handles flushing of write buffer #392
Comments
dcoutts
added a commit
that referenced
this issue
Sep 17, 2024
Only one fix in the code is needed, for the NoThunks test. The state machine test found two non-trivial differences between the spec and the model in relation to blobs: 1. blob retrieval after snapshot fails 2. blob retrieval after failed-snapshot (due to duplicate name) fails The reason for both of these is that the implementation currently works by modifying flushing the write buffer to disk, and modifing the table handle to use the updated set of runs and now-empty write buffer. This invalidates all blob references from the write buffer itself. For the moment I have "fixed" this difference by changing the model to behave like the implementation. For the TODO task to fix this, see #392
dcoutts
added a commit
that referenced
this issue
Sep 17, 2024
dcoutts
added a commit
that referenced
this issue
Sep 17, 2024
Only one fix in the code is needed, for the NoThunks test. The state machine test found two non-trivial differences between the spec and the model in relation to blobs: 1. blob retrieval after snapshot fails 2. blob retrieval after failed-snapshot (due to duplicate name) fails The reason for both of these is that the implementation currently works by modifying flushing the write buffer to disk, and modifing the table handle to use the updated set of runs and now-empty write buffer. This invalidates all blob references from the write buffer itself. For the moment I have "fixed" this difference by changing the model to behave like the implementation. For the TODO task to fix this, see #392
dcoutts
added a commit
that referenced
this issue
Sep 17, 2024
dcoutts
added a commit
that referenced
this issue
Sep 18, 2024
Only one fix in the code is needed, for the NoThunks test. The state machine test found two non-trivial differences between the spec and the model in relation to blobs: 1. blob retrieval after snapshot fails 2. blob retrieval after failed-snapshot (due to duplicate name) fails The reason for both of these is that the implementation currently works by modifying flushing the write buffer to disk, and modifing the table handle to use the updated set of runs and now-empty write buffer. This invalidates all blob references from the write buffer itself. For the moment I have "fixed" this difference by changing the model to behave like the implementation. For the TODO task to fix this, see #392
dcoutts
added a commit
that referenced
this issue
Sep 18, 2024
dcoutts
added a commit
that referenced
this issue
Sep 18, 2024
Only one fix in the code is needed, for the NoThunks test. The state machine test found two non-trivial differences between the spec and the model in relation to blobs: 1. blob retrieval after snapshot fails 2. blob retrieval after failed-snapshot (due to duplicate name) fails The reason for both of these is that the implementation currently works by modifying flushing the write buffer to disk, and modifing the table handle to use the updated set of runs and now-empty write buffer. This invalidates all blob references from the write buffer itself. For the moment I have "fixed" this difference by changing the model to behave like the implementation. For the TODO task to fix this, see #392
dcoutts
added a commit
that referenced
this issue
Sep 18, 2024
dcoutts
added a commit
that referenced
this issue
Sep 18, 2024
Only one fix in the code is needed, for the NoThunks test. The state machine test found two non-trivial differences between the spec and the model in relation to blobs: 1. blob retrieval after snapshot fails 2. blob retrieval after failed-snapshot (due to duplicate name) fails The reason for both of these is that the implementation currently works by modifying flushing the write buffer to disk, and modifing the table handle to use the updated set of runs and now-empty write buffer. This invalidates all blob references from the write buffer itself. For the moment I have "fixed" this difference by changing the model to behave like the implementation. For the TODO task to fix this, see #392
dcoutts
added a commit
that referenced
this issue
Sep 18, 2024
jorisdral
changed the title
TODO: change how snapshot handles flushing of write buffer
Change how snapshot handles flushing of write buffer
Sep 23, 2024
This part is already done |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Currently,
snapshot
will mutate the given handle by flushing the write buffer, before trying to take a snapshot of all the runs.This has the effect of invalidating all blob references, which is surprising and does not correspond to the spec/model.
A plausible solution would be to do it like this: flush the write buffer to a new run, but do not modify the table handle to use that new run. Leave the table handle as-is (thus not invalidating blob refs). Then save all the runs (including the one flushed from the write buffer) into the snapshot.
Difficulties: flushing one more run may necessitate kicking off more merges, otherwise it would leave an over-full level 1. Perhaps this is ok to do for a snapshot, and the merging can be performed upon restoring the snapshot. In other words don't enforce (or explicitly relax) the levels shape invariant (which checks the number of runs in a level) for a snapshot, but enforce it upon restoration by starting the appropriate merges.
In addition, the current snapshot code modifies the table handle state before it checks if the snapshot name is already taken or not, so even if snapshot fails then there are state changes that invalidate the blob references.
The code should be adjusted to open the snapshot file first (thus doing the duplicate name check) and then do the rest of the work.
When this is done, the state machine model can be simplified. See
Database.LSMTree.Model.Normal.Session
which references this issue.The text was updated successfully, but these errors were encountered: