Remove scheduler plugin "buffer" resources #840

sharnoff · 2024-03-01T07:53:49Z

Problem description / Motivation

In order to prevent accidental overcommitting, on startup the scheduler plugin has a measure of "uncertainty" for each VM's usage that is resolved only when the autoscaler-agent makes a request to the scheduler plugin to inform it of its intentions.

This has two issues:

Forced to choose between "unavailable" and "inaccurate", we have chosen "unavailable". In practice this is much worse / higher risk than inaccuracies.
There's a short (~5s) period of unavailability immediately after startup. When we add replicas & leader election for the scheduler, this unavailability could be frequent enough to cause liveness issues for the scheduler (i.e. it may be unable to schedule for extended periods of time)

Feature idea(s) / DoD

Scheduler plugin scheduling uncertainty should not cause unavailability

Implementation ideas

Instead of keeping "buffer" resources, we should just entirely remove it, and be willing to make inaccurate scheduling decisions. Worst-case scenario is that we accidentally overcommit by a little bit — in practice, real resource usage in our clusters is much lower than reserved resources, so we have wiggle room.

Omrigan · 2024-04-19T17:34:00Z

Do we have an estimation how long does state rebuilding progresses? E. g. if it takes 20s to build 80% of state, maybe we should delay scheduling until we have reports from 80% of nodes, and then accept the "20% of inaccuracy"?

sharnoff · 2024-04-21T22:57:37Z

It's an interesting idea... I think in theory we could do this, but in practice:

Delaying scheduling via the plugin is risky, because Pods may take a long time to be rescheduled if we fail them (and in general, we'd rather make sub-optimal scheduling decisions than completely block scheduling).
Delaying scheduling by waiting to return from plugin creation (like in plugin: Replace readClusterState with existing watch events #904) would not work, I think — IIRC the scheduler is only marked "ready" once everything's running, and the autoscaler-agents only communicate with it once it's "ready", so the plugin initialization would be waiting for requests that never come
- That said, we could make modifications to the conditions for communicating, or the readiness checks, and maybe this would work

For this issue, I'm currently thinking the easiest way forward is to move the autoscaler-agent <-> scheduler plugin protocol into annotations on the VMs, which would:

Guarantee that all information is available just by reading the cluster state (so: no waiting for comms — basically, if we do it right, "buffer" would always be zero because everything is known)
Make replicas / leader election simple, because the autoscaler-agents wouldn't be communicating with any particular scheduler instance (see Scheduler leader election #841)

What do you think?

Omrigan · 2024-04-22T10:37:05Z

For this issue, I'm currently thinking the easiest way forward is to move the autoscaler-agent <-> scheduler plugin protocol into annotations on the VMs, which would:

1. Guarantee that all information is available just by reading the cluster state (so: no waiting for comms — basically, if we do it right, "buffer" would always be zero because everything is known)

2. Make replicas / leader election simple, because the autoscaler-agents wouldn't be communicating with any particular scheduler instance (see [Scheduler leader election #841](https://github.com/neondatabase/autoscaling/issues/841))

What do you think?

Sounds good!

sharnoff · 2024-05-12T21:12:24Z

RFC for the stuff above: https://www.notion.so/neondatabase/e9d3253836124f9f98e681f89c146a19

sharnoff added a/reliability Area: relates to reliability of the service c/autoscaling/scheduler Component: autoscaling: k8s scheduler labels Mar 1, 2024

Omrigan mentioned this issue Mar 14, 2024

Bug: scheduler has negative "buffer" value #852

Open

sharnoff mentioned this issue Apr 15, 2024

plugin: Replace readClusterState with existing watch events #904

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove scheduler plugin "buffer" resources #840

Remove scheduler plugin "buffer" resources #840

sharnoff commented Mar 1, 2024

Omrigan commented Apr 19, 2024

sharnoff commented Apr 21, 2024

Omrigan commented Apr 22, 2024

sharnoff commented May 12, 2024

Remove scheduler plugin "buffer" resources #840

Remove scheduler plugin "buffer" resources #840

Comments

sharnoff commented Mar 1, 2024

Problem description / Motivation

Feature idea(s) / DoD

Implementation ideas

Omrigan commented Apr 19, 2024

sharnoff commented Apr 21, 2024

Omrigan commented Apr 22, 2024

sharnoff commented May 12, 2024