-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Native support for incremental restore #13239
base: main
Are you sure you want to change the base?
Conversation
@mszeszko-meta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
b9cc02d
to
f9d0de2
Compare
@mszeszko-meta has updated the pull request. You must reimport the pull request before landing. |
@mszeszko-meta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
f9d0de2
to
aab8ef8
Compare
@mszeszko-meta has updated the pull request. You must reimport the pull request before landing. |
@mszeszko-meta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
aab8ef8
to
6d5c8e5
Compare
@mszeszko-meta has updated the pull request. You must reimport the pull request before landing. |
1 similar comment
@mszeszko-meta has updated the pull request. You must reimport the pull request before landing. |
b665268
to
7f5348f
Compare
@mszeszko-meta has updated the pull request. You must reimport the pull request before landing. |
1 similar comment
@mszeszko-meta has updated the pull request. You must reimport the pull request before landing. |
bf5dcf5
to
7b3f446
Compare
@mszeszko-meta has updated the pull request. You must reimport the pull request before landing. |
@mszeszko-meta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@mszeszko-meta has updated the pull request. You must reimport the pull request before landing. |
@mszeszko-meta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, except for compatibility with the obscure "excluded files" feature. I have a bit more to review, but sending this feedback ASAP.
options_.disable_auto_compactions = true; | ||
options_.level0_file_num_compaction_trigger = 1000; | ||
|
||
std::vector<std::string> always_copyable_files = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explicit list of files like this is fragile to changes in DB operations that might change the order or when we allocate new file numbers. You might have taken inspiration from existing testings in this same test file, but those are subtly either (a) constructing a dummy DB from a set of file names and looking at how they are handled, or (b) injecting extra files into the backup dir to be cleaned up. Do you think we can avoid this?
c41b139
to
0fb2333
Compare
@mszeszko-meta has updated the pull request. You must reimport the pull request before landing. |
@mszeszko-meta has updated the pull request. You must reimport the pull request before landing. |
Summary: As follow-up to facebook#13239, this change is primarily motivated by simplifying the calling conventions of LogAndApply. Since it must be called while holding the DB mutex, it can read safely read cfd->GetLatestMutableCFOptions(), until it releases the mutex. Before it releases the mutex, it makes a copy of the mutable options in a new, unpublished Version object, which can be used when not holding the DB mutex. This eliminates the need for callers of LogAndApply to copy mutable options for its sake, or even specify mutable options at all. And it eliminates the need for *another* copy saved in ManifestWriter. Other functions that don't need the mutable options parameter: * ColumnFamilyData::CreateNewMemtable() * CompactionJob::Install() / InstallCompactionResults() * MemTableList::*InstallMemtable*() * Version::PrepareAppend() Test Plan: existing tests, CI with sanitizers
…files obsure feature
5cfe5e6
to
69e7299
Compare
@mszeszko-meta has updated the pull request. You must reimport the pull request before landing. |
@mszeszko-meta has updated the pull request. You must reimport the pull request before landing. |
Summary
With this change we are adding native library support for incremental restores. When designing the solution we decided to follow 'tiered' approach where users can pick one of the three predefined, and for now, mutually exclusive restore modes (
kKeepLatestDbSessionIdFiles
,kVerifyChecksum
andkPurgeAllFiles
[default]) - trading write IO / CPU for the degree of certainty that the existing destination db files match selected backup files contents. New mode option is exposed via existingRestoreOptions
configuration, which by this time has been already well-baked into our APIs. Restore engine will consume this configuration and infer which of the existing destination db files are 'in policy' to be retained during restore.Motivation
This work is motivated by internal customer who is running write-heavy, 1M+ QPS service and is using RocksDB restore functionality to scale up their fleet. Given already high QPS on their end, additional write IO from restores as-is today is contributing to prolonged spikes which lead the service to hit BLOB storage write quotas, which finally results in slowing down the pace of their scaling. See T206217267 for more.
Impact
Enable faster service scaling by reducing write IO footprint on BLOB storage (coming from restore) to the absolute minimum.
Key technical nuances
kKeepLatestDbSessionIdFiles
mode. To find more about the risks / tradeoffs for using this mode, please check the related comment inbackup_engine.cc
. This mode is only supported for SSTs where we persist thedb_session_id
information in the metadata footer.kVerifyChecksum
mode requires a full blob / SST file scan (assuming backup file has its'checksum_hex
metadata set appropriately, if not additional file scan for backup file). While it saves us on write IOs (if checksums match), it's still fairly complex and potentially CPU intensive operation.WorkItemType
enum introduced in Generalize work item definition in BackupEngineImpl #13228 to accommodate a new simple request toComputeChecksum
, which will enable us to run 2) in parallel. This will become increasingly more important as we're moving towards disaggregated storage and holding up the sequence of checksum evaluations on a single lagging remote file scan would not be acceptable.Test plan
./backup_engine_test --gtest_filter=*IncrementalRestore*
covering the following scenarios: ✅kVerifyChecksum
andkKeepLatestDbSessionIdFiles
modes../backup_engine_test --gtest_filter=*ExcludedFiles*
kKeepLatestDbSessionIdFiles
mode, unable to restore in every other mode. 👷