Change how snapshot handles flushing of write buffer #392

dcoutts · 2024-09-13T11:01:13Z

Currently, snapshot will mutate the given handle by flushing the write buffer, before trying to take a snapshot of all the runs.

This has the effect of invalidating all blob references, which is surprising and does not correspond to the spec/model.

A plausible solution would be to do it like this: flush the write buffer to a new run, but do not modify the table handle to use that new run. Leave the table handle as-is (thus not invalidating blob refs). Then save all the runs (including the one flushed from the write buffer) into the snapshot.

Difficulties: flushing one more run may necessitate kicking off more merges, otherwise it would leave an over-full level 1. Perhaps this is ok to do for a snapshot, and the merging can be performed upon restoring the snapshot. In other words don't enforce (or explicitly relax) the levels shape invariant (which checks the number of runs in a level) for a snapshot, but enforce it upon restoration by starting the appropriate merges.

In addition, the current snapshot code modifies the table handle state before it checks if the snapshot name is already taken or not, so even if snapshot fails then there are state changes that invalidate the blob references.

The code should be adjusted to open the snapshot file first (thus doing the duplicate name check) and then do the rest of the work.

When this is done, the state machine model can be simplified. See Database.LSMTree.Model.Normal.Session which references this issue.

The text was updated successfully, but these errors were encountered:

Only one fix in the code is needed, for the NoThunks test. The state machine test found two non-trivial differences between the spec and the model in relation to blobs: 1. blob retrieval after snapshot fails 2. blob retrieval after failed-snapshot (due to duplicate name) fails The reason for both of these is that the implementation currently works by modifying flushing the write buffer to disk, and modifing the table handle to use the updated set of runs and now-empty write buffer. This invalidates all blob references from the write buffer itself. For the moment I have "fixed" this difference by changing the model to behave like the implementation. For the TODO task to fix this, see #392

See #392

Only one fix in the code is needed, for the NoThunks test. The state machine test found two non-trivial differences between the spec and the model in relation to blobs: 1. blob retrieval after snapshot fails 2. blob retrieval after failed-snapshot (due to duplicate name) fails The reason for both of these is that the implementation currently works by modifying flushing the write buffer to disk, and modifing the table handle to use the updated set of runs and now-empty write buffer. This invalidates all blob references from the write buffer itself. For the moment I have "fixed" this difference by changing the model to behave like the implementation. For the TODO task to fix this, see #392

See #392

Only one fix in the code is needed, for the NoThunks test. The state machine test found two non-trivial differences between the spec and the model in relation to blobs: 1. blob retrieval after snapshot fails 2. blob retrieval after failed-snapshot (due to duplicate name) fails The reason for both of these is that the implementation currently works by modifying flushing the write buffer to disk, and modifing the table handle to use the updated set of runs and now-empty write buffer. This invalidates all blob references from the write buffer itself. For the moment I have "fixed" this difference by changing the model to behave like the implementation. For the TODO task to fix this, see #392

See #392

Only one fix in the code is needed, for the NoThunks test. The state machine test found two non-trivial differences between the spec and the model in relation to blobs: 1. blob retrieval after snapshot fails 2. blob retrieval after failed-snapshot (due to duplicate name) fails The reason for both of these is that the implementation currently works by modifying flushing the write buffer to disk, and modifing the table handle to use the updated set of runs and now-empty write buffer. This invalidates all blob references from the write buffer itself. For the moment I have "fixed" this difference by changing the model to behave like the implementation. For the TODO task to fix this, see #392

See #392

Only one fix in the code is needed, for the NoThunks test. The state machine test found two non-trivial differences between the spec and the model in relation to blobs: 1. blob retrieval after snapshot fails 2. blob retrieval after failed-snapshot (due to duplicate name) fails The reason for both of these is that the implementation currently works by modifying flushing the write buffer to disk, and modifing the table handle to use the updated set of runs and now-empty write buffer. This invalidates all blob references from the write buffer itself. For the moment I have "fixed" this difference by changing the model to behave like the implementation. For the TODO task to fix this, see #392

See #392

jorisdral · 2024-11-04T14:41:28Z

...

In addition, the current snapshot code modifies the table handle state before it checks if the snapshot name is already taken or not, so even if snapshot fails then there are state changes that invalidate the blob references.

The code should be adjusted to open the snapshot file first (thus doing the duplicate name check) and then do the rest of the work.

...

This part is already done

dcoutts added a commit that referenced this issue Sep 17, 2024

Update docs for BlobRef to note snapshot bug / misbehaviour

bf9eec4

See #392

dcoutts mentioned this issue Sep 17, 2024

Enable returning BlobRefs in the API, enable tests, and minor fixes #394

Merged

dcoutts added a commit that referenced this issue Sep 17, 2024

Update docs for BlobRef to note snapshot bug / misbehaviour

f96833c

See #392

dcoutts added a commit that referenced this issue Sep 18, 2024

Update docs for BlobRef to note snapshot bug / misbehaviour

5f29ee6

See #392

dcoutts added a commit that referenced this issue Sep 18, 2024

Update docs for BlobRef to note snapshot bug / misbehaviour

1d24e1f

See #392

dcoutts added a commit that referenced this issue Sep 18, 2024

Update docs for BlobRef to note snapshot bug / misbehaviour

1c39402

See #392

jorisdral changed the title ~~TODO: change how snapshot handles flushing of write buffer~~ Change how snapshot handles flushing of write buffer Sep 23, 2024

jorisdral mentioned this issue Oct 16, 2024

Base implementation of scheduled merges #426

Merged

jorisdral assigned wenkokke Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change how snapshot handles flushing of write buffer #392

Change how snapshot handles flushing of write buffer #392

dcoutts commented Sep 13, 2024 •

edited

Loading

jorisdral commented Nov 4, 2024

Change how snapshot handles flushing of write buffer #392

Change how snapshot handles flushing of write buffer #392

Comments

dcoutts commented Sep 13, 2024 • edited Loading

jorisdral commented Nov 4, 2024

dcoutts commented Sep 13, 2024 •

edited

Loading