Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add badger index cache #6258

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

SungJin1212
Copy link
Contributor

@SungJin1212 SungJin1212 commented Oct 10, 2024

Introduce a badgerDB (https://github.com/dgraph-io/badger) as a disk index cache. It could be a middle layer cache between in-memory and remote cache (in-memory -> (badger) -> Memcached or Redis).

There are two loop in the badger index cache, retentionLoop and diskStatUpdateLoop.
retentionLoop: periodically run badgerDB GC to sink disk space.
diskStatUpdateLoop: exports metric badger_value_log_space_available_bytes and badger_value_log_space_available_bytes track amount of disk space left on the value and key log used in Badger at mount point in bytes.

Which issue(s) this PR fixes:
Fixes #6241

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@dosubot dosubot bot added go Pull requests that update Go code type/feature labels Oct 10, 2024
@SungJin1212 SungJin1212 marked this pull request as draft October 10, 2024 11:55
@SungJin1212 SungJin1212 force-pushed the Add-badger-index-cache branch 13 times, most recently from be32122 to 3ed5938 Compare October 11, 2024 06:29

# [Experimental] How long to cache an index for a block.
# CLI flag: -blocks-storage.bucket-store.index-cache.badger.index-ttl
[index_ttl: <duration> | default = 24h]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this the default value 24h for now? Let's not expose it if other index caches don't expose this parameter


# [Experimental] Keys Data directory in which to cache index data.
# CLI flag: -blocks-storage.bucket-store.index-cache.badger.keys-data-dir
[key_data_dir: <string> | default = "./index-cache/keys"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep only one parameter for the badger DB directory and use <dir>/keys and <dir>/values by default

// valueLogSpaceAvailable stores the amount of space left on the value log mount point in bytes
valueLogSpaceAvailable prometheus.Gauge
// keyLogSpaceAvailable stores the amount of space left on the key log mount point in bytes
keyLogSpaceAvailable prometheus.Gauge
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it calculate the amount of space left?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is calculated by # of available block * block size at the mount point.

)

const (
defaultDataDir = "./index-cache"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use badger-index-cache instead?

c.keyLogSpaceAvailable = promauto.With(reg).NewGauge(prometheus.GaugeOpts{
Name: "badger_key_log_space_available_bytes",
Help: "Amount of space left on the key log mount point in bytes.",
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good way to get used bytes instead of available bytes left?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a method called Size.

ticker := time.NewTicker(interval)
defer ticker.Stop()

for range ticker.C {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend calling Size to get the size of the LSM key and values file and check if we need to run GC based on if it exceeds a threshold.


// err is raised when nothing to clean
for err == nil {
err = c.cache.RunValueLogGC(0.5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does RunValueLogGC clean up the associated key files the same time when cleaning up the values?
Want to make sure we don't keep the values forever.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By quickly checking some of the docs I think it doesn't clean up keys. It only discard those stale values after compaction. Not like how cache works to evict the key and value.
So the key and value still there and wait until TTL to be removed. I don't think it is the right behavior tbh. We should probably delete the key ourselves.

@SungJin1212
Copy link
Contributor Author

SungJin1212 commented Oct 14, 2024

@yeya24
Using Size, we can track bytes usage of vlog, lsm. But, I should do more digging to delete keys when some threshold reached.

@SungJin1212 SungJin1212 force-pushed the Add-badger-index-cache branch 8 times, most recently from 5c0d04f to 15b0dd0 Compare October 14, 2024 10:55
@alanprot
Copy link
Member

This is a very interesting PR.. at some point i was thinking on doing something similar.

@SungJin1212 were you able to do some load test on this to see how it performs compared with the other caching solutions?

@SungJin1212
Copy link
Contributor Author

@alanprot
Thank you for your interest in my work.
I haven't done the load test yet, but I was thinking that the load tests are necessary.
Which performance aspects are essential in the load test do you think?

@SungJin1212 SungJin1212 force-pushed the Add-badger-index-cache branch 2 times, most recently from 7ec38fb to 1002047 Compare October 25, 2024 07:46
Signed-off-by: SungJin1212 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go Pull requests that update Go code size/XXL type/feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Store Gateway: index cache with local disk based backend
3 participants