-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
plugin: Replace readClusterState with existing watch events #904
Conversation
Do you mean these watch events? autoscaling/pkg/plugin/plugin.go Lines 150 to 160 in 76ceab2
I am trying to understand how this relates to #873. For #873 my understanding was to handle those callbacks from pod watcher, and get rid of VM watcher entirely. |
Broadly, watch events like those - but it only relies on the Pod watch events, I think. (and specifically, this PR waits for only the initial Pod "add" events to be completely processed)
Right, yeah fair. This PR should be ok - we only care about the Pod watch events. (unless I missed something - please feel free to double-check me no that) |
2b32115
to
edc5942
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have some questions
// The initial state building is complete when the counter reaches zero, at which point we close | ||
// the channel that this function will wait on. | ||
|
||
var initEventsCount atomic.Int32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use semaphore here instead?
- Acquire(nil, n) when events are added
- Release(1) when event is processed
- Acquire(ctxWithTimeout, n) when waiting for events processing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems simpler. Implemented it in caba29f but tbh I'm not sure if it's better. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the incEventCount
is left as a semaphore wrapper anyway, maybe instead revert to your old implementation, but wrap all in a struct synchronization primitive?
The reason I suggested semaphore is that I am wary of using raw channels and atomics in the code, where it feels like it should be a synchronization primitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made it a separate type in 521597e - LMK what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagined the implementation with initEventsCount hidden inside the new helper.
Also, you don't need incEventCount now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagined the implementation with initEventsCount hidden inside the new helper.
Hm, it's hard to make that kind of abstraction safe to use (because the counter wouldn't be monotonic) -- maybe it'd be fine here, but I'd rather keep the logic inlined than make an unsafe abstraction for just this purpose.
So in its current state, incEventCount or equivalent is still needed (we can just directly add to the atomic; don't know whether that's better)
521597e
to
b1e53de
Compare
b1e53de
to
6555a77
Compare
allowDeny: false, | ||
// this may be a preexisting VM. If so, we should include it in "buffer" as long it's | ||
// supposed to be handled by us (otherwise, the "buffer" will never be resolved) | ||
includeBuffer: pod.Spec.SchedulerName == e.state.conf.SchedulerName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we even have SchedulerName as a part of the spec, why do we ever want it to be different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Result of discussing this: it's part of the VM spec because neonvm is a logically separate dependency of "autoscaling", so the ability to run with/without the rest of autoscaling requires the ability to set the pod's scheduler name
Are you going to do rebase merge? If so, tmp commit needs to be squashed somewhere, maybe into middle commit. |
After the changes from #865 that switched to fetching VmInfo from Pod objects, the VM store is no longer needed long-term, just during initial cluster state reading (which can *also* be swapped out) and for event generation. This commit changes the VM store from being stored in the AutoscaleEnforcer struct to being passed in to readClusterState as an argument.
In short, readClusterState is super complicated, separately reimplements the reserveResources() logic, and may be the source of several startup-related bugs (probably #671 and #852). So, given that we *already* have a pathway for updating our internal state from changes in the cluster (i.e. the watch events), we should just use that instead.
c8e250e
to
ceceb07
Compare
After the changes from #865 that switched to fetching VmInfo from Pod objects, the VM store is no longer needed long-term, just during initial cluster state reading (which can *also* be swapped out) and for event generation. This commit changes the VM store from being stored in the AutoscaleEnforcer struct to being passed in to readClusterState as an argument.
... and a few other commits tossed in for fun.
Probably best to review commit-by-commit.
The thing I'm most worried about is that this potentially may change buffer handling.
Might be a good idea to go for #840 first, but I suspect that might also have its own issues.
This is a precursor to #517, to minimize the number of places we need to change resource handling.