-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce SnapshotRepository find_latest and wire up partition restore #2353
base: feat/snapshot-upload
Are you sure you want to change the base?
Conversation
With this change, Partition Processor startup now checks the snapshot repository for a partition snapshot before creating a blank store database. If a recent snapshot is available, we will restore that instead of replaying the log from the beginning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for creating this PR @pcholakov. The changes look really nice! I had a few minor question. It would be great to add the streaming write before merging. Once this is resolved +1 for merging :-)
)); | ||
let file_path = snapshot_dir.path().join(filename); | ||
let file_data = self.object_store.get(&key).await?; | ||
tokio::fs::write(&file_path, file_data.bytes().await?).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it would indeed be great to write the file in streaming fashion to disk. Especially once our SSTs grow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something like
let mut file_data = self.object_store.get(&key).await?.into_stream();
let mut snapshot_file = tokio::fs::File::create_new(&file_path).await?;
while let Some(data) = file_data.next().await {
snapshot_file.write_all(&data?).await?;
}
can already be enough. Do you know how large the chunks of the stream returned by self.object_store.get(&key).await?.into_stream()
will be?
let partition_store = if !partition_store_manager | ||
.has_partition_store(pp_builder.partition_id) | ||
.await |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of scope of this PR: What is the plan how to handle a PP that has some data but the data is lagging too far behind? So starting the PP would result into a trim gap. Would we then drop the respective column family and restart it?
/// Discover and download the latest snapshot available. Dropping the returned | ||
/// `LocalPartitionSnapshot` will delete the local snapshot data files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it because the files are stored in a temp directory? On LocalPartitionSnapshot
itself I couldn't find how the files are deleted when dropping it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the temp dir also the mechanism to clean things up if downloading it failed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that TempDir::with_prefix_in
takes care of it since it deletes the files when it gets dropped. This is a nice solution!
"Found snapshot to bootstrap partition, restoring it", | ||
); | ||
partition_store_manager | ||
.open_partition_store_from_snapshot( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In
restate/crates/rocksdb/src/rock_access.rs
Line 156 in 531b987
fn import_cf( |
With this change, Partition Processor startup now checks the snapshot repository
for a partition snapshot before creating a blank store database. If a recent
snapshot is available, we will restore that instead of replaying the log from
the beginning.
Closes: #2000
Open tasks:
Future work:
Testing
Created snapshot by running
restatectl create-snapshot -p 0
, then dropped the partition CF withrocksdb_ldb drop_column_family --db=./restate-data/.../db data-0
.Running
restate-server
correctly restores the most recent available snapshot: