Replies: 1 comment 3 replies
-
I like this idea, but think this will not be possible since we rely on HPA for this. But I'm sure @zroubalik will be able to tell us; |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
The current scaling method is hard coded as a linear scaling of
ceiling(obsered_metric/target_metric)
Example:
observed_metric = 100 backlog/instance
target_metric = 10 backlog/instance
ceiling(100/10)
ceiling(10)
10
= new scale target (capped by min/max configured instance count)An interesting long term direction would be to allow this scaling algorithm to be configured with alternate implementations (not necessarily user-provided implementations, but at least a selection of provided ones). This would allow for something like a hill-climbing algorithm as discussed here: kedacore/http-add-on#142 (reply in thread)
This would allow for experimentation with different algorithms that can have better behaviors than the current linear scaling method.
For example, consider a message processing workload that is scaled based on a queue backlog that processes messages by persisting to a backend datastore. As the backlog increases, more workload instances will be provisioned, with the expectation that more instances will result in higher throughput. However, if the backlog is the result of slower processing because the backend datastore is unable to keep up with demand (disk failure, low resources, networking, anything), then adding more workload instances may well cause more pressure on the backend datastore and result in even lower overall throughput making the situation worse.
A better option would be a scaling implementation like a hill climbing algorithm that monitors how changes in workload instances impacts a target metric (this would likely be an target overall metric like total throughput, not a per instance metric as currently implemented) and probes whether increases actually help meet that target or make it worse.
This would prevent the current linear scaling implementation's pathological case where (assuming the above message processing workload) the backend datastore is temporarily unavailable for a sustained period of time, causing a backlog large enough to scale workers up to the configure maximum number of instance (costing $$$) while making zero progress on processing any of the messages.
The example hill climbing algorithm may not be the best for all use cases, but having a configurable scaling algorithm would allow for selection as well as relatively low risk experimentation of new implementations that address scenarios that result in poor performance from the existing scaling implementations.
Beta Was this translation helpful? Give feedback.
All reactions