Consider adding a 2nd cluster-autoscaler with a small footprint for the operator node group only #2345

RobertLucian · 2021-07-15T23:11:18Z

Description

The cluster-autoscaler can use up lots of memory when there are lots of nodes to keep track of or/and lots of nodes that need to be added. The cluster-autoscaler has been observed to use up to 1.4GiB of memory (our current limit is set to 1GiB).

If the autoscaler gets evicted because it's using too much memory, there won't be a way to scale up the operator node group for the subsequent pending pod of the autoscaler as there is no autoscaler left to do the job. The suggestion is to have another cluster autoscaler with a minimal resource footprint that would only be responsible for scaling the operator node group. Because this autoscaler will only be watching a single node group (that can only go up to 25 nodes) and because we would be setting the addition rate limit to a small value (i.e. 1-2 node / min), we can be sure this autoscaler won't go big on the resource utilization. This autoscaler will scale up the node group in case the other cluster autoscaler gets evicted.

This cluster autoscaler deployment should also have a higher priority than anything else on the operator node group, to ensure that every other cortex pod will receive a node:
https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/

The text was updated successfully, but these errors were encountered:

RobertLucian added bug Something isn't working provisioning Something related to cluster provisioning performance A performance improvement and removed bug Something isn't working provisioning Something related to cluster provisioning labels Jul 15, 2021

miguelvr removed the bug Something isn't working label Jul 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider adding a 2nd cluster-autoscaler with a small footprint for the operator node group only #2345

Consider adding a 2nd cluster-autoscaler with a small footprint for the operator node group only #2345

RobertLucian commented Jul 15, 2021

Consider adding a 2nd cluster-autoscaler with a small footprint for the operator node group only #2345

Consider adding a 2nd cluster-autoscaler with a small footprint for the operator node group only #2345

Comments

RobertLucian commented Jul 15, 2021

Description