Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native support for incremental restore #13239

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

mszeszko-meta
Copy link
Contributor

@mszeszko-meta mszeszko-meta commented Dec 20, 2024

Summary

With this change we are adding native library support for incremental restores. When designing the solution we decided to follow 'tiered' approach where users can pick one of the three predefined, and for now, mutually exclusive restore modes (kKeepLatestDbSessionIdFiles, kVerifyChecksum and kPurgeAllFiles [default]) - trading write IO / CPU for the degree of certainty that the existing destination db files match selected backup files contents. New mode option is exposed via existing RestoreOptions configuration, which by this time has been already well-baked into our APIs. Restore engine will consume this configuration and infer which of the existing destination db files are 'in policy' to be retained during restore.

Motivation

This work is motivated by internal customer who is running write-heavy, 1M+ QPS service and is using RocksDB restore functionality to scale up their fleet. Given already high QPS on their end, additional write IO from restores as-is today is contributing to prolonged spikes which lead the service to hit BLOB storage write quotas, which finally results in slowing down the pace of their scaling. See T206217267 for more.

Impact

Enable faster service scaling by reducing write IO footprint on BLOB storage (coming from restore) to the absolute minimum.

Key technical nuances

  1. According to prior investigations, the risk of collisions on [file #, db session id, file size] metadata triplets is low enough to the point that we can confidently use it to uniquely describe the file and its' perceived contents, which is the rationale behind the kKeepLatestDbSessionIdFiles mode. To find more about the risks / tradeoffs for using this mode, please check the related comment in backup_engine.cc. This mode is only supported for SSTs where we persist the db_session_id information in the metadata footer.
  2. kVerifyChecksum mode requires a full blob / SST file scan (assuming backup file has its' checksum_hex metadata set appropriately, if not additional file scan for backup file). While it saves us on write IOs (if checksums match), it's still fairly complex and potentially CPU intensive operation.
  3. We're extending the WorkItemType enum introduced in Generalize work item definition in BackupEngineImpl #13228 to accommodate a new simple request to ComputeChecksum, which will enable us to run 2) in parallel. This will become increasingly more important as we're moving towards disaggregated storage and holding up the sequence of checksum evaluations on a single lagging remote file scan would not be acceptable.
  4. Note that it's necessary to compute the checksum on the restored file if corresponding backup file and existing destination db file checksums didn't match.

Test plan

  1. Manual testing using debugger: ✅
  2. Automated tests:
  • ./backup_engine_test --gtest_filter=*IncrementalRestore* covering the following scenarios: ✅
    • Full clean restore
    • User workflow simulation: happy path with mix of added new files and deleted original backup files,
    • Existing db files corruptions and the difference in handling between kVerifyChecksum and kKeepLatestDbSessionIdFiles modes.
  • ./backup_engine_test --gtest_filter=*ExcludedFiles*
    • Integrate existing test collateral with newly introduced restore modes ✅
    • Test edge case scenario with excluded file missing across all supplied backups but being present and up to date in db file system. Expectation: Able to restore in kKeepLatestDbSessionIdFiles mode, unable to restore in every other mode. 👷

@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has updated the pull request. You must reimport the pull request before landing.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has updated the pull request. You must reimport the pull request before landing.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@pdillinger pdillinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, except for compatibility with the obscure "excluded files" feature. I have a bit more to review, but sending this feedback ASAP.

include/rocksdb/utilities/backup_engine.h Outdated Show resolved Hide resolved
include/rocksdb/utilities/backup_engine.h Show resolved Hide resolved
utilities/backup/backup_engine.cc Outdated Show resolved Hide resolved
utilities/backup/backup_engine.cc Outdated Show resolved Hide resolved
utilities/backup/backup_engine.cc Show resolved Hide resolved
utilities/backup/backup_engine.cc Outdated Show resolved Hide resolved
utilities/backup/backup_engine.cc Outdated Show resolved Hide resolved
utilities/backup/backup_engine.cc Outdated Show resolved Hide resolved
utilities/backup/backup_engine.cc Outdated Show resolved Hide resolved
utilities/backup/backup_engine.cc Outdated Show resolved Hide resolved
utilities/backup/backup_engine.cc Outdated Show resolved Hide resolved
utilities/backup/backup_engine.cc Outdated Show resolved Hide resolved
utilities/backup/backup_engine.cc Outdated Show resolved Hide resolved
utilities/backup/backup_engine.cc Show resolved Hide resolved
options_.disable_auto_compactions = true;
options_.level0_file_num_compaction_trigger = 1000;

std::vector<std::string> always_copyable_files = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicit list of files like this is fragile to changes in DB operations that might change the order or when we allocate new file numbers. You might have taken inspiration from existing testings in this same test file, but those are subtly either (a) constructing a dummy DB from a set of file names and looking at how they are handled, or (b) injecting extra files into the backup dir to be cleaned up. Do you think we can avoid this?

@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has updated the pull request. You must reimport the pull request before landing.

pdillinger added a commit to pdillinger/rocksdb that referenced this pull request Jan 15, 2025
Summary: As follow-up to facebook#13239, this change is primarily motivated by
simplifying the calling conventions of LogAndApply. Since it must be
called while holding the DB mutex, it can read safely read
cfd->GetLatestMutableCFOptions(), until it releases the mutex. Before it
releases the mutex, it makes a copy of the mutable options in a new,
unpublished Version object, which can be used when not holding the DB
mutex. This eliminates the need for callers of LogAndApply to copy
mutable options for its sake, or even specify mutable options at all.
And it eliminates the need for *another* copy saved in ManifestWriter.

Other functions that don't need the mutable options parameter:
* ColumnFamilyData::CreateNewMemtable()
* CompactionJob::Install() / InstallCompactionResults()
* MemTableList::*InstallMemtable*()
* Version::PrepareAppend()

Test Plan: existing tests, CI with sanitizers
@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has updated the pull request. You must reimport the pull request before landing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants