-
Notifications
You must be signed in to change notification settings - Fork 49
常见问题
TerarkDB 使用自己的算法和数据结构实现了一个 SSTable,获得了更好的随机读性能和更好的压缩率.
是的,TerarkDB 对 RocksDB 做了少量修改:
- 增加了在 TerarkDB SSTable 创建时的两边扫描支持,以减少磁盘使用(不影响现有的其他 SSTable).
- 增加了 TerarkDB 环境变量配置(这样用户可以在使用我们的
librocksdb.so
替换官方版本后,直接设置环境变量就可以支持 TerarkDB).
TerarkDB 没有修改任何 RocksDB 用户层的 API,从用户的角度 TerarkDB 和 RocksDB 100% 兼容,用户在替换为 TerarkDB 后甚至不需要重新编译应用。
TerarkDB 通过 MongoRocks 和 MyRocks 兼容 MongoDB 和 MySQL。
5. Since the core algorithm of TerarkDB is proprietary, how to be compliant with MongoDB’s AGPL license?
- All Terark code related to MongoDB is open source (MongoDB itself and MongoRocks), these do not use any TerarkDB code or API.
- The core of TerarkDB is a plug-in for RocksDB, so it is not directly applied into MongoDB -- it is loaded as a dynamic library for librocksdb.so, and TerarkDB is compliant with RocksDB’s license.
- For key, the peak memory is about total_key_length * 1.3 during compression.
- For value, the memory usage is about input_size*18%, and never exceeds 12G during compression.
To give you an example, to compress 2TB of data, in which "keys" are 30GB:
- TerarkDB would need ~39GB of RAM to compress the keys (into index).
- TerarkDB would need ~12GB of RAM to compress the values.
- Key compressing and Value compressing can be in parallel if the memory is sufficient, and in serial if the memory is not sufficient.
Level compaction has larger write amplification, resulting slower compression speed and shorter SSD lifetime.
8. TerarkDB is faster in random read and smaller in compressed data size, but what is the trade-off?
The compression speed is slower, and use more memory during compression. This does not impact instant writes. SINGLE thread compression speed is about 15~50 MB/s, so multiple threads are used for compression.
TerarkDB has nothing to do with lock, SSTable in RocksDB itself is read-only after compaction, and write-only during compaction. So we did not make any change on locking.
RocksDB’s SStable will not be that large.
TerarkDB supports data of huge size like that, but each SSTable is not necessarily that large, a single SSTable is hundreds of GB at most.
It depends, usually the max number of keys in an SSTable can not exceed 1.5 billions, and the total length of all keys can be much larger(practically 30GB should be safe).
For values, the total length of compressed values can not exceeds 128PB(practically this limit will never be hit), the number of values is the same as the number of keys.
People will unlikely to generate such large single SSTable and hit the limit, and TerarkDB will finish such large SSTable and create a new SSTable.
It depends on the data set and memory limit.
On TPC-H lineitem data with unlimited memory (row len 615 bytes, key len 23 bytes, text field length 512 bytes), for a SINGLE thread:
- For key, ~10MB/s on Xeon 2630 v3, up to ~80MB/s on a different data set.
- For value, ~500 MB/s on Xeon 2630 v3, up to ~7GB/s on a different data set.
As a general rule, below showing the relative comparable magnitudes of read speed:
- When Memory is slightly limited(all data fit in memory by TerarkDB, not by Other DBs)
DB | Read mode | Speed | Description | |
---|---|---|---|---|
TerarkDB | sequential | 3000 | all data exactly in memory no page cache miss |
CPU Cache & TLB miss is medium |
random | 2000 | CPU Cache & TLB miss is heavy | ||
Other DB | sequential | 10000 | hot data always fit in cache | |
random | 10 | no hot data, heavy cache miss |
- When Memory is extremely limited
DB | Read mode | Speed | Description |
---|---|---|---|
TerarkDB | sequential | 2000 | hot data always fit in cache |
random | 10 | no hot data, heavy cache miss | |
Other DB | sequential | 10000 | hot data always fit in cache |
random | 1 | no hot data, very heavy cache miss |
It depends on the data set.
On TPC-H lineitem data (row len 615 bytes, key len 23 bytes, text field length 512 bytes), for a SINGLE thread:
- For key, ~9MB/s on Xeon 2630 v3, up to ~90MB/s on a different data set.
- For value, ~40 MB/s on Xeon 2630 v3, up to ~120MB/s on a different data set.
It depends on the data set.
On wikipedia data, all english text of ~109G, is compressed into ~23G.
On TPC-H lineitem of ~550G, keys are ~22G, values are ~528G. Average row len is 615 bytes, in which key is 23 bytes, value is 592 bytes, in which the configurable text field is 512 bytes:
- All keys are compressed into ~5.3G.
- All values are compressed into ~24G. (So the SSTable size is 29.3G)
- For key, < 80 bytes will be best fit, and of course >80 Bytes still works.
- For value, between 50 bytes and 30KB will be best fit, and it works for other length too.