Bug: Resources are not properly isolated in Standalone mode #19245

Li0k · 2024-11-04T07:01:03Z

When RW starts in default mode, each component is deployed in a different pod and utilises separate memory.

For example

CN calculates the memory resources that hummock can use with the function `storage_memory_config
Compactor uses all the memory provided by the system by default and divides it between worker and cache.

However, this can lead to OOM in Standalone mode. In Standalone mode we don't isolate CPU and Memory resources, and competition for CPU is natural, but not for Memory.

The text was updated successfully, but these errors were encountered:

Li0k · 2024-11-04T07:01:36Z

cc @kwannoel @hzxa21

lmatz · 2024-11-05T08:42:52Z

I wonder:

If there is a minimum memory requirement for the compactor
if there is any relationship between the live CPU usage and the memory usage of the compactor, e.g. if the compactor is using 2CPU at the moment, is there a cap on the memory usage?

I am wondering if we also limit the CPU usage of compactor a bit to ensure stability at the cost of potentially leaving idle resources on the table, e.g. if 8CPU in total, then the compactor can take half = 4CPU at most.

Li0k · 2024-11-05T09:39:57Z

I wonder:

If there is a minimum memory requirement for the compactor

if there is any relationship between the live CPU usage and the memory usage of the compactor, e.g. if the compactor is using 2CPU at the moment, is there a cap on the memory usage?

I am wondering if we also limit the CPU usage of compactor a bit to ensure stability at the cost of potentially leaving idle resources on the table, e.g. if 8CPU in total, then the compactor can take half = 4CPU at most.

It relay on the config of sstable_size and block_size

let min_compactor_memory_limit_bytes = (storage_opts.sstable_size_mb * (1 << 20)
            + storage_opts.block_size_kb * (1 << 10))
            as u64;

The number of tasks is limited by the number of available cpu cores, the more tasks the more memory is used. The amount of memory consumed by each task is related to the content of the task and there is no general formula for calculating this. (Do not isolation the cpu core).
I have no bias for CPUs, relying on Tokio scheduling, cpu competition is fair and doesn't cause fatal problems.

Also, if we plan to deploy standalone on a high-spec machine (e.g. 64c 256GB), as you said, not having CPU isolation may cause stability issues, as it did in the affnity test.

Li0k added type/feature type/bug Something isn't working labels Nov 4, 2024

github-actions bot added this to the release-2.2 milestone Nov 4, 2024

Li0k removed the type/feature label Nov 4, 2024

Li0k mentioned this issue Nov 4, 2024

fix(compactor): introduce dedicated config for compactor meta cache #19203

Draft

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Resources are not properly isolated in Standalone mode #19245

Bug: Resources are not properly isolated in Standalone mode #19245

Li0k commented Nov 4, 2024

Li0k commented Nov 4, 2024

lmatz commented Nov 5, 2024

Li0k commented Nov 5, 2024

Bug: Resources are not properly isolated in Standalone mode #19245

Bug: Resources are not properly isolated in Standalone mode #19245

Comments

Li0k commented Nov 4, 2024

Li0k commented Nov 4, 2024

lmatz commented Nov 5, 2024

Li0k commented Nov 5, 2024