Skip to content

Commit

Permalink
[#21625,#21627] Docdb: Clear stale meta-cache entries at the end of c…
Browse files Browse the repository at this point in the history
…lone

Summary:
As part of the clone workflow, we repartition all the tables of the target database that has been created by executing the dump script. This means removing the old tablets and creating new ones during the import snapshot phase. However, we saw some cases where the old tablets are cached in the meta-cache of the tserver that executed the schema creation script. Other tservers can also have these stale metacache entries. For example, as part of executing `CREATE INDEX`, we send `BACKFILL INDEX` queries to the tserves that host the base table tablets' leaders which populates the cache with old tablets. The stale meta-cache entries are used later to execute the queries that arrive to tservers. However, the stale tablets are deleted in the import snapshot phase which leads to the following error:
```
d3=# select count(*) from t2 where age<18;
ERROR:  LookupByIdRpc(tablet: 89b4445772d2415aa1702a77031b7d74, num_attempts: 2) failed: Tablet deleted: Not serving tablet deleted upon request at 2024-08-01 15:39:31 UTC
```
It is worth mentioning that we encounter this issue only in the first query that is executed in the tserver with stale metacache. If we retry the same query another time, it will work fine as the meta-cache has invalidated the stale entry. We saw this issue only in the colocated database when there is an index. This is because as part of executing `CREATE INDEX` command, we ask for the TableLocations of the parent colocated tablet.

The diff fixes the problem by introducing a new tserver RPC `ClearMetaCacheEntriesForNamespace` which clears all the metacache entries (tables and tablets) related to the clone database. This RPC is sent to all tservers as part of clone workflow. More specifically, clearing the metacache happens at the final step of clone i.e. after successfully restoring the snapshot on the clone database but before enabling user connections to the database. User connections to the clone database are enabled after successfully clearing the stale metacache entries of all tservers.

**Upgrade/Rollback safety**
The diff adds a new RPC `ClearMetacache` that is only used in instant database cloning workflow currently. The clone feature is protected by the preview flag: `enable_db_clone`.

Jira: DB-10520, DB-10522

Test Plan:
./yb_build.sh fastdebug --cxx-test integration-tests_minicluster-snapshot-test --gtest_filter Colocation/PgCloneTestWithColocatedDBParam.CloneAfterDropIndex/1

Also tested manually that the ClearMetacache is clearing only the entries that belong to one specific database using the end point: `:9000/api/v1/meta-cache` which shows the set of tablets in the metacache. I checked that the tablet `0000000000` is not cleared after executing the RPC as intented.

Reviewers: asrivastava, mlillibridge

Reviewed By: asrivastava

Subscribers: yguan, ybase, slingam

Differential Revision: https://phorge.dev.yugabyte.com/D37353
  • Loading branch information
yamen-haddad committed Sep 17, 2024
1 parent 0c41023 commit 388e045
Show file tree
Hide file tree
Showing 23 changed files with 280 additions and 7 deletions.
4 changes: 4 additions & 0 deletions src/yb/client/client.cc
Original file line number Diff line number Diff line change
Expand Up @@ -2993,6 +2993,10 @@ void YBClient::ClearAllMetaCachesOnServer() {
data_->meta_cache_->ClearAll();
}

Status YBClient::ClearMetacache(const std::string& namespace_id) {
return data_->meta_cache_->ClearCacheEntries(namespace_id);
}

bool YBClient::RefreshTabletInfoWithConsensusInfo(
const tserver::TabletConsensusInfoPB& newly_received_info) {
auto status = data_->meta_cache_->RefreshTabletInfoWithConsensusInfo(newly_received_info);
Expand Down
2 changes: 2 additions & 0 deletions src/yb/client/client.h
Original file line number Diff line number Diff line change
Expand Up @@ -1024,6 +1024,8 @@ class YBClient {

void ClearAllMetaCachesOnServer();

Status ClearMetacache(const std::string& namespace_id);

// Uses the TabletConsensusInfo piggybacked from a response to
// refresh a RemoteTablet in metacache. Returns true if the
// RemoteTablet was indeed refreshed, false otherwise.
Expand Down
42 changes: 41 additions & 1 deletion src/yb/client/meta_cache.cc
Original file line number Diff line number Diff line change
Expand Up @@ -53,14 +53,17 @@
#include "yb/client/table.h"
#include "yb/client/yb_table_name.h"

#include "yb/common/colocated_util.h"
#include "yb/common/common_consensus_util.h"
#include "yb/common/wire_protocol.h"
#include "yb/common/ysql_utils.h"

#include "yb/gutil/map-util.h"
#include "yb/gutil/ref_counted.h"
#include "yb/gutil/strings/substitute.h"

#include "yb/master/master_client.proxy.h"
#include "yb/master/sys_catalog_constants.h"

#include "yb/rpc/rpc_fwd.h"

Expand Down Expand Up @@ -2440,13 +2443,50 @@ std::future<Result<internal::RemoteTabletPtr>> MetaCache::LookupTabletByKeyFutur

void MetaCache::ClearAll() {
std::lock_guard lock(mutex_);
ts_cache_.clear();
tables_.clear();
tablets_by_id_.clear();
tablet_lookups_by_id_.clear();
deleted_tablets_.clear();
}

Status MetaCache::ClearCacheEntries(const std::string& namespace_id) {
std::lock_guard lock(mutex_);
LOG(INFO) << Format("Clearing MetaCache entries for namespace: $0", namespace_id);
// Stores the tables and tablets that belong to the namespace namespace_id
std::set<TableId> db_tables_ids;
std::set<TabletId> db_tablets_ids;
for (const auto& [table_id, table_data] : tables_) {
// Escape sys catalog and parent table ids as they don't conform to a typical ysql table id
if (table_id == master::kSysCatalogTableId) {
continue;
} else if (IsColocationParentTableId(table_id)) {
db_tables_ids.insert(table_id);
continue;
} else if (VERIFY_RESULT(GetNamespaceIdFromYsqlTableId(table_id)) == namespace_id) {
VLOG(5) << Format(
"Marking table: $0 for clearing from metacache as it is part of namespace $1: ", table_id,
namespace_id);
for (const auto& [_, remote_tablet] : table_data.tablets_by_partition) {
// Do not clear the sys.catalog tablet
if (remote_tablet->tablet_id() != master::kSysCatalogTabletId) {
db_tablets_ids.insert(remote_tablet->tablet_id());
}
}
db_tables_ids.insert(table_id);
}
}
for (const auto& table_id : db_tables_ids) {
VLOG(4) << Format("Erasing table: $0 from metacache", table_id);
tables_.erase(table_id);
}
for (const auto& tablet_id : db_tablets_ids) {
VLOG(4) << Format("Erasing tablet: $0 from metacache", tablet_id);
tablets_by_id_.erase(tablet_id);
tablet_lookups_by_id_.erase(tablet_id);
}
return Status::OK();
}

LookupDataGroup::~LookupDataGroup() {
std::vector<LookupData*> leftovers;
while (auto* d = lookups.Pop()) {
Expand Down
2 changes: 2 additions & 0 deletions src/yb/client/meta_cache.h
Original file line number Diff line number Diff line change
Expand Up @@ -627,6 +627,8 @@ class MetaCache : public RefCountedThreadSafe<MetaCache> {

void ClearAll();

Status ClearCacheEntries(const std::string& namespace_id);

// TabletConsensusInfo is piggybacked from the response of a TServer.
// Returns Status::OK() if and only if the meta-cache was updated.
Status RefreshTabletInfoWithConsensusInfo(
Expand Down
42 changes: 42 additions & 0 deletions src/yb/integration-tests/minicluster-snapshot-test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -660,6 +660,7 @@ class PgCloneTest : public PostgresMiniClusterTest {
std::unique_ptr<pgwrapper::PGConn> source_conn_;

const std::string kSourceNamespaceName = "testdb";
const std::string kSourceTableName = "t1";
const std::string kTargetNamespaceName1 = "testdb_clone1";
const std::string kTargetNamespaceName2 = "testdb_clone2";
const MonoDelta kTimeout = MonoDelta::FromSeconds(30);
Expand Down Expand Up @@ -788,6 +789,47 @@ TEST_P(PgCloneTestWithColocatedDBParam, YB_DISABLE_TEST_IN_SANITIZERS(CloneAfter
ASSERT_EQ(row, kRows[0]);
}

// The test is disabled in Sanitizers as ysql_dump fails in ASAN builds due to memory leaks
// inherited from pg_dump.
TEST_P(PgCloneTestWithColocatedDBParam, YB_DISABLE_TEST_IN_SANITIZERS(CloneAfterDropIndex)) {
// Clone to a time before a drop index and check that the index exists with correct data.
// 1. Create a table and load some data.
// 2. Create an index on the table.
// 3. Mark time t.
// 4. Drop index.
// 5. Clone the database as of time t.
// 6. Check the index exists in the clone with the correct data.
const std::vector<std::tuple<int32_t, int32_t>> kRows = {{1, 10}};
const std::string kIndexName = "t1_v_idx";

ASSERT_OK(source_conn_->ExecuteFormat(
"INSERT INTO t1 VALUES ($0, $1)", std::get<0>(kRows[0]), std::get<1>(kRows[0])));

ASSERT_OK(source_conn_->ExecuteFormat("CREATE INDEX $0 ON t1(value)", kIndexName));

// Scans should use the index now.
auto is_index_scan = ASSERT_RESULT(
source_conn_->HasIndexScan(Format("SELECT * FROM t1 where value=$0", std::get<1>(kRows[0]))));
LOG(INFO) << "Scans uses index scan " << is_index_scan;
ASSERT_TRUE(is_index_scan);

auto clone_to_time = ASSERT_RESULT(GetCurrentTime()).ToInt64();
ASSERT_OK(source_conn_->ExecuteFormat("DROP INDEX $0", kIndexName));

ASSERT_OK(source_conn_->ExecuteFormat(
"CREATE DATABASE $0 TEMPLATE $1 AS OF $2", kTargetNamespaceName1, kSourceNamespaceName,
clone_to_time));

// Verify table t1 exists in the clone database and that the index is used to fetch the data.
auto target_conn = ASSERT_RESULT(ConnectToDB(kTargetNamespaceName1));
is_index_scan = ASSERT_RESULT(
target_conn.HasIndexScan(Format("SELECT * FROM t1 WHERE value=$0", std::get<1>(kRows[0]))));
ASSERT_TRUE(is_index_scan);
auto row = ASSERT_RESULT((target_conn.FetchRow<int32_t, int32_t>(
Format("SELECT * FROM t1 WHERE value=$0", std::get<1>(kRows[0])))));
ASSERT_EQ(row, kRows[0]);
}

TEST_F(PgCloneTest, YB_DISABLE_TEST_IN_SANITIZERS(TabletSplitting)) {
const int kNumRows = 1000;

Expand Down
35 changes: 35 additions & 0 deletions src/yb/master/async_rpc_tasks.cc
Original file line number Diff line number Diff line change
Expand Up @@ -2033,6 +2033,41 @@ bool AsyncClonePgSchema::SendRequest(int attempt) {

MonoTime AsyncClonePgSchema::ComputeDeadline() { return deadline_; }

// ============================================================================
// Class AsyncClearMetacache.
// ============================================================================
AsyncClearMetacache::AsyncClearMetacache(
Master* master, ThreadPool* callback_pool, const std::string& permanent_uuid,
const std::string& namespace_id, ClearMetacacheCallbackType callback)
: RetrySpecificTSRpcTask(
master, callback_pool, permanent_uuid, /* async_task_throttler */ nullptr),
namespace_id(namespace_id),
callback_(callback) {}

std::string AsyncClearMetacache::description() const { return "Async ClearMetacache RPC"; }

void AsyncClearMetacache::HandleResponse(int attempt) {
Status resp_status = Status::OK();
if (resp_.has_error()) {
resp_status = StatusFromPB(resp_.error().status());
LOG(WARNING) << "Clear Metacache entries for namespace " << namespace_id
<< " failed: " << resp_status;
TransitionToFailedState(state(), resp_status);
} else {
TransitionToCompleteState();
}
WARN_NOT_OK(callback_(), "Failed to execute the callback of AsyncClearMetacache");
}

bool AsyncClearMetacache::SendRequest(int attempt) {
tserver::ClearMetacacheRequestPB req;
req.set_namespace_id(namespace_id);
ts_proxy_->ClearMetacacheAsync(req, &resp_, &rpc_, BindRpcCallback());
VLOG_WITH_PREFIX(1) << Format(
"Sent clear metacache entries request of namespace: $0 to $1", namespace_id, tablet_id());
return true;
}

// ============================================================================
// Class AsyncEnableDbConns.
// ============================================================================
Expand Down
27 changes: 27 additions & 0 deletions src/yb/master/async_rpc_tasks.h
Original file line number Diff line number Diff line change
Expand Up @@ -1096,6 +1096,33 @@ class AsyncClonePgSchema : public RetrySpecificTSRpcTask {
ClonePgSchemaCallbackType callback_;
};

class AsyncClearMetacache : public RetrySpecificTSRpcTask {
public:
using ClearMetacacheCallbackType = std::function<Status()>;
AsyncClearMetacache(
Master* master, ThreadPool* callback_pool, const std::string& permanent_uuid,
const std::string& namespace_id, ClearMetacacheCallbackType callback);

server::MonitoredTaskType type() const override {
return server::MonitoredTaskType::kClearMetaCache;
}

std::string type_name() const override { return "Clear all meta-caches of a tserver"; }

std::string description() const override;

protected:
void HandleResponse(int attempt) override;
bool SendRequest(int attempt) override;
// Not associated with a tablet.
TabletId tablet_id() const override { return TabletId(); }

private:
std::string namespace_id;
tserver::ClearMetacacheResponsePB resp_;
ClearMetacacheCallbackType callback_;
};

class AsyncEnableDbConns : public RetrySpecificTSRpcTask {
public:
using EnableDbConnsCallbackType = std::function<Status(Status)>;
Expand Down
8 changes: 8 additions & 0 deletions src/yb/master/clone/clone_state_entity.cc
Original file line number Diff line number Diff line change
Expand Up @@ -88,4 +88,12 @@ void CloneStateInfo::SetRestorationId(const TxnSnapshotRestorationId& restoratio
restoration_id_ = restoration_id;
}

std::shared_ptr<CountDownLatch> CloneStateInfo::NumTserversWithStaleMetacache() {
return num_tservers_with_stale_metacache;
}

void CloneStateInfo::SetNumTserversWithStaleMetacache(uint64_t count) {
num_tservers_with_stale_metacache = std::make_shared<CountDownLatch>(count);
}

} // namespace yb::master
9 changes: 9 additions & 0 deletions src/yb/master/clone/clone_state_entity.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@
#include "yb/master/catalog_entity_info.pb.h"
#include "yb/master/sys_catalog.h"

#include "yb/util/countdown_latch.h"

namespace yb::master {

struct PersistentCloneStateInfo : public Persistent<SysCloneStatePB> {};
Expand Down Expand Up @@ -69,6 +71,9 @@ class CloneStateInfo : public MetadataCowWrapper<PersistentCloneStateInfo> {
const TxnSnapshotRestorationId& RestorationId();
void SetRestorationId(const TxnSnapshotRestorationId& restoration_id);

std::shared_ptr<CountDownLatch> NumTserversWithStaleMetacache();
void SetNumTserversWithStaleMetacache(uint64_t count);

private:
// The ID field is used in the sys_catalog table.
const std::string clone_request_id_;
Expand All @@ -84,6 +89,10 @@ class CloneStateInfo : public MetadataCowWrapper<PersistentCloneStateInfo> {
// This is set before the clone state is set to RESTORING.
TxnSnapshotRestorationId restoration_id_ GUARDED_BY(mutex_) = TxnSnapshotRestorationId::Nil();

// The number of tservers that a Clear Metacache rpc has been sent to but didn't respond with
// success. Only enable connections to target DB after all tservers cleared thier metacache.
std::shared_ptr<CountDownLatch> num_tservers_with_stale_metacache;

std::mutex mutex_;

DISALLOW_COPY_AND_ASSIGN(CloneStateInfo);
Expand Down
6 changes: 6 additions & 0 deletions src/yb/master/clone/clone_state_manager-test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,11 @@ class CloneStateManagerTest : public YBTest {
const std::string& target_db_name, const std::string& source_owner,
const std::string& target_owner, HybridTime restore_ht,
AsyncClonePgSchema::ClonePgSchemaCallbackType callback, MonoTime deadline), (override));
MOCK_METHOD(
Status, ScheduleClearMetaCacheTasks,
(const TSDescriptorVector& tservers, const std::string& namespace_id,
AsyncClearMetacache::ClearMetacacheCallbackType callback),
(override));
MOCK_METHOD(
Status, ScheduleEnableDbConnectionsTask,
(const std::string& permanent_uuid, const std::string& target_db_name,
Expand Down Expand Up @@ -146,6 +151,7 @@ class CloneStateManagerTest : public YBTest {
CoarseTimePoint deadline), (override));

MOCK_METHOD(Result<TSDescriptorPtr>, PickTserver, (), (override));
MOCK_METHOD(TSDescriptorVector, GetTservers, (), (override));
};

private:
Expand Down
Loading

0 comments on commit 388e045

Please sign in to comment.