Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PD: supports multiple level meta data space #87

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

zhangjinpeng87
Copy link
Member

Signed-off-by: zhangjinpeng1987 [email protected]

PD supports multiple level meta data space.

text/0083-multi-level-meta-data-space.md Show resolved Hide resolved

1. Multiple TiKV Cluster share the same PD cluster. Because the minimal demplyment of a TiKV Cluster is 3 TiKV 3 PD,
but it is not cost-effect if every small cluster has 3 dedicated meta data node.
2. There are Multiple tenant in the same TiKV Cluster, each tenant has it own meta data, each tenant's key range can
Copy link
Contributor

@nolouch nolouch Jan 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the keyspace in API v2 match this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, v2 API can not satisfy multiple TiDB tenants.

Copy link
Member Author

@zhangjinpeng87 zhangjinpeng87 Jan 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When there are multiple TiDB tenants, each TiDB should has its own ddl-owner, gc-safepoint and other meta data, these meta data should be stored in PD separately. This RFC is more about how PD store multiple user's meta data.

## Alternatives

In the multi-tenant scenario, tenant can add a {tenant-id} prefix for each data key, but tenant-id
is a meta data esstionally, each data key has a tenant-id prefix may cost more disk space & memory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any perf stats to show the cost?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The insert QPS of having prefix has 4% regression compare with no prefix.

Copy link
Member Author

@zhangjinpeng87 zhangjinpeng87 Jan 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bigger key size will consume more raftlog or wal and more CPU when comparing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What prefix is used for testing? Note a two byte prefix can support 32768 tenants already.

1. Multiple TiKV Cluster share the same PD cluster. Because the minimal demplyment of a TiKV Cluster is 3 TiKV 3 PD,
but it is not cost-effect if every small cluster has 3 dedicated meta data node.
2. There are Multiple tenant in the same TiKV Cluster, each tenant has it own meta data, each tenant's key range can
contains any key in the range of [min-key, max-key].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make it practical, every APIs need to be accept a user prefix. And each users' data can't be stored in the same rocksdb obviously. This also requires PD have knowledge about the underlying storage engine and avoid scheduling replicas from different users to the same storage engine. And TiKV needs to split all memory meta to different users. For example, the index of range becomes HashMap<UserKey, BTreeMap<Vec<u8>, u64>>.

In my opinion, using prefix is more straightforward and simpler.

Copy link
Member Author

@zhangjinpeng87 zhangjinpeng87 Jan 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And each users' data can't be stored in the same rocksdb obviously

This is what I expected. After TiKV implemented the Multiple-RocksDB feature, data from different tenant should stored in the different RocksDB instance. Tenant is the meta data, include the meta data in very row of data is redundant, we can store the tenant id to the RocksDB instance's directory name, like u0001_rangeid. Even more, the table id essentially also is meta data, it can can be stored in the directory name like u0001_rangeid_tableid, so the data key in RocksDB row_id. In this way, we can satisfy the compatibility requirement with old cluster's data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using prefix can also achieve the same improvement. The difference of using prefix and using a separate explicit meta is that PD/TiKV/TiDB needs to take good care about meta in the later case.

Signed-off-by: zhangjinpeng1987 <[email protected]>
@zhangjinpeng87
Copy link
Member Author

Another using scenario: multiple tidb cluster share the same pd cluster to reduce the overhead of PD in the TiDB Cloud.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants