-
Notifications
You must be signed in to change notification settings - Fork 797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add badger index cache #6258
base: master
Are you sure you want to change the base?
Add badger index cache #6258
Conversation
be32122
to
3ed5938
Compare
|
||
# [Experimental] How long to cache an index for a block. | ||
# CLI flag: -blocks-storage.bucket-store.index-cache.badger.index-ttl | ||
[index_ttl: <duration> | default = 24h] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this the default value 24h for now? Let's not expose it if other index caches don't expose this parameter
|
||
# [Experimental] Keys Data directory in which to cache index data. | ||
# CLI flag: -blocks-storage.bucket-store.index-cache.badger.keys-data-dir | ||
[key_data_dir: <string> | default = "./index-cache/keys"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep only one parameter for the badger DB directory and use <dir>/keys
and <dir>/values
by default
// valueLogSpaceAvailable stores the amount of space left on the value log mount point in bytes | ||
valueLogSpaceAvailable prometheus.Gauge | ||
// keyLogSpaceAvailable stores the amount of space left on the key log mount point in bytes | ||
keyLogSpaceAvailable prometheus.Gauge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does it calculate the amount of space left?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is calculated by # of available block * block size at the mount point.
) | ||
|
||
const ( | ||
defaultDataDir = "./index-cache" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use badger-index-cache
instead?
c.keyLogSpaceAvailable = promauto.With(reg).NewGauge(prometheus.GaugeOpts{ | ||
Name: "badger_key_log_space_available_bytes", | ||
Help: "Amount of space left on the key log mount point in bytes.", | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a good way to get used bytes instead of available bytes left?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a method called Size
.
ticker := time.NewTicker(interval) | ||
defer ticker.Stop() | ||
|
||
for range ticker.C { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend calling Size
to get the size of the LSM key and values file and check if we need to run GC based on if it exceeds a threshold.
|
||
// err is raised when nothing to clean | ||
for err == nil { | ||
err = c.cache.RunValueLogGC(0.5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does RunValueLogGC
clean up the associated key files the same time when cleaning up the values?
Want to make sure we don't keep the values forever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By quickly checking some of the docs I think it doesn't clean up keys. It only discard those stale values after compaction. Not like how cache works to evict the key and value.
So the key and value still there and wait until TTL to be removed. I don't think it is the right behavior tbh. We should probably delete the key ourselves.
@yeya24 |
5c0d04f
to
15b0dd0
Compare
15b0dd0
to
55747f9
Compare
This is a very interesting PR.. at some point i was thinking on doing something similar. @SungJin1212 were you able to do some load test on this to see how it performs compared with the other caching solutions? |
@alanprot |
7ec38fb
to
1002047
Compare
1002047
to
449909e
Compare
Signed-off-by: SungJin1212 <[email protected]>
449909e
to
34b98f6
Compare
Introduce a badgerDB (https://github.com/dgraph-io/badger) as a disk index cache. It could be a middle layer cache between in-memory and remote cache (in-memory -> (badger) -> Memcached or Redis).
There are two loop in the badger index cache,
retentionLoop
anddiskStatUpdateLoop
.retentionLoop
: periodically run badgerDB GC to sink disk space.diskStatUpdateLoop
: exports metricbadger_value_log_space_available_bytes
andbadger_value_log_space_available_bytes
track amount of disk space left on the value and key log used in Badger at mount point in bytes.Which issue(s) this PR fixes:
Fixes #6241
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]