-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: scheduler has negative "buffer" value #852
Labels
Comments
sharnoff
added
t/bug
Issue Type: Bug
c/autoscaling/scheduler
Component: autoscaling: k8s scheduler
labels
Mar 9, 2024
Can be solved by #840? |
sharnoff
added a commit
that referenced
this issue
Apr 15, 2024
In short, readClusterState is super complicated, separately reimplements the reserveResources() logic, and may be the source of several startup-related bugs (probably #671 and #852). So, given that we *already* have a pathway for updating our internal state from changes in the cluster (i.e. the watch events), we should just use that instead.
sharnoff
added a commit
that referenced
this issue
Apr 16, 2024
In short, readClusterState is super complicated, separately reimplements the reserveResources() logic, and may be the source of several startup-related bugs (probably #671 and #852). So, given that we *already* have a pathway for updating our internal state from changes in the cluster (i.e. the watch events), we should just use that instead.
sharnoff
added a commit
that referenced
this issue
May 10, 2024
In short, readClusterState is super complicated, separately reimplements the reserveResources() logic, and may be the source of several startup-related bugs (probably #671 and #852). So, given that we *already* have a pathway for updating our internal state from changes in the cluster (i.e. the watch events), we should just use that instead.
sharnoff
added a commit
that referenced
this issue
May 21, 2024
In short, readClusterState is super complicated, separately reimplements the reserveResources() logic, and may be the source of several startup-related bugs (probably #671 and #852). So, given that we *already* have a pathway for updating our internal state from changes in the cluster (i.e. the watch events), we should just use that instead.
sharnoff
added a commit
that referenced
this issue
May 22, 2024
In short, readClusterState is super complicated, separately reimplements the reserveResources() logic, and may be the source of several startup-related bugs (probably #671 and #852). So, given that we *already* have a pathway for updating our internal state from changes in the cluster (i.e. the watch events), we should just use that instead.
sharnoff
added a commit
that referenced
this issue
May 22, 2024
In short, readClusterState is super complicated, separately reimplements the reserveResources() logic, and may be the source of several startup-related bugs (probably #671 and #852). So, given that we *already* have a pathway for updating our internal state from changes in the cluster (i.e. the watch events), we should just use that instead.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Environment
Prod (occurred twice recently)
Steps to reproduce
Not yet clear. Here's an example:
I think it's entirely caused by faulty logic in
(*AutoscaleEnforcer).readClusterState()
, but haven't looked into it thoroughly.And tbh, it's a little weird that
readClusterState
has its own implementation of reserve logic, rather than using the shared version that was added in #666.Expected result
Any buffer value from adding a VM should be non-negative.
Actual result
The memory "buffer" value was negative (see:
-1Gi buffer
), and the value for CPU underflowed.Other logs, links
The text was updated successfully, but these errors were encountered: