-
Notifications
You must be signed in to change notification settings - Fork 21
-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove scheduler plugin "buffer" resources #840
Comments
Do we have an estimation how long does state rebuilding progresses? E. g. if it takes 20s to build 80% of state, maybe we should delay scheduling until we have reports from 80% of nodes, and then accept the "20% of inaccuracy"? |
It's an interesting idea... I think in theory we could do this, but in practice:
For this issue, I'm currently thinking the easiest way forward is to move the autoscaler-agent <-> scheduler plugin protocol into annotations on the VMs, which would:
What do you think? |
Sounds good! |
RFC for the stuff above: https://www.notion.so/neondatabase/e9d3253836124f9f98e681f89c146a19 |
Problem description / Motivation
In order to prevent accidental overcommitting, on startup the scheduler plugin has a measure of "uncertainty" for each VM's usage that is resolved only when the autoscaler-agent makes a request to the scheduler plugin to inform it of its intentions.
This has two issues:
Feature idea(s) / DoD
Scheduler plugin scheduling uncertainty should not cause unavailability
Implementation ideas
Instead of keeping "buffer" resources, we should just entirely remove it, and be willing to make inaccurate scheduling decisions. Worst-case scenario is that we accidentally overcommit by a little bit — in practice, real resource usage in our clusters is much lower than reserved resources, so we have wiggle room.
The text was updated successfully, but these errors were encountered: