Fix missing cases of corruption retries #13122

anand1976 · 2024-11-07T01:21:12Z

This PR fixes a few cases where RocksDB was not retrying checksum failure/corruption of file reads with the verify_and_reconstruct_read IO option. After fixing these cases, we can almost always successfully open the DB and execute reads even if we see transient corruptions, provided the FileSystem supports the verify_and_reconstruct_read option. The specific cases fixed in this PR are -

CURRENT file
IDENTITY file
OPTIONS file
SST footer

Test plan:
Unit test in db_io_failure_test.cc that injects corruption at various stages of DB open and reads

jaykorean · 2024-11-07T18:11:09Z

db/db_io_failure_test.cc

+  // is allocated with max_open_files - 10 as capacity. So override
+  // max_open_files to 11 so table cache capacity will become 1. This will
+  // prevent file open during DB open and force the file to be opened
+  // during MultiGet


This is a nice trick!

jaykorean · 2024-11-07T18:16:23Z

options/options_parser.cc

+    }
+    if (s.ok()) {
+      s = ValidityCheck();
+    }


nit: if (s.ok()) { return s; } after this validity check would be more readable to me then setting retry = false at the else in line 349. Or you could explicitly return on line 349.

jaykorean · 2024-11-07T18:19:37Z

options/options_parser.cc

+      s = ValidityCheck();
+    }
+    if (!s.ok()) {
+      if ((s.IsCorruption() || s.IsInvalidArgument()) && !retry &&


Question for my own learning. In what scenario that we'd get s.IsInvalidArgument() here and retry with kVerifyAndReconstructRead would succeed?

Any syntax errors during parsing are being considered as Status::InvalidArgument(). The syntax error could be due to corruption, and can potentially be corrected by retrying.

jaykorean · 2024-11-07T18:23:15Z

db/db_io_failure_test.cc

+          return s;
+        }
+
+        // This means the next read after injecting corruption was not


wondering if the comment was cut

jaykorean · 2024-11-07T18:25:44Z

db/db_io_failure_test.cc

+      ss << std::setw(3) << 100 * sst + key;
+      ASSERT_OK(Put("key" + ss.str(), "val" + ss.str()));
+    }
+    Flush();


ASSERT_OK(Flush());

jaykorean · 2024-11-07T18:25:54Z

db/db_io_failure_test.cc

+    }
+    Flush();
+  }
+  Close();


ASSERT_OK(Close());

Looks like DBTestBase::Close() does not return a Status.

facebook-github-bot · 2024-11-07T19:55:36Z

@anand1976 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

jaykorean

👍

Fix missing cases of corruption retries

eafc882

anand1976 requested a review from jaykorean November 7, 2024 01:21

facebook-github-bot added the CLA Signed label Nov 7, 2024

jaykorean reviewed Nov 7, 2024

View reviewed changes

Adress comments and fix tests

3ae93ac

jaykorean approved these changes Nov 8, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix missing cases of corruption retries #13122

Fix missing cases of corruption retries #13122

anand1976 commented Nov 7, 2024

jaykorean Nov 7, 2024

jaykorean Nov 7, 2024 •

edited

Loading

jaykorean Nov 7, 2024

anand1976 Nov 7, 2024

jaykorean Nov 7, 2024

jaykorean Nov 7, 2024

jaykorean Nov 7, 2024

anand1976 Nov 7, 2024

facebook-github-bot commented Nov 7, 2024

jaykorean left a comment

Fix missing cases of corruption retries #13122

Are you sure you want to change the base?

Fix missing cases of corruption retries #13122

Conversation

anand1976 commented Nov 7, 2024

jaykorean Nov 7, 2024

Choose a reason for hiding this comment

jaykorean Nov 7, 2024 • edited Loading

Choose a reason for hiding this comment

jaykorean Nov 7, 2024

Choose a reason for hiding this comment

anand1976 Nov 7, 2024

Choose a reason for hiding this comment

jaykorean Nov 7, 2024

Choose a reason for hiding this comment

jaykorean Nov 7, 2024

Choose a reason for hiding this comment

jaykorean Nov 7, 2024

Choose a reason for hiding this comment

anand1976 Nov 7, 2024

Choose a reason for hiding this comment

facebook-github-bot commented Nov 7, 2024

jaykorean left a comment

Choose a reason for hiding this comment

jaykorean Nov 7, 2024 •

edited

Loading