-
Notifications
You must be signed in to change notification settings - Fork 28
Calculate cpu and mem savings in tortoise #427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
randytqwjp
wants to merge
8
commits into
main
Choose a base branch
from
effectiveness-metrics-fix-v3
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 6 commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
cf5bc34
calculate cpu and mem changes from replica changes
randytqwjp e3547a6
minor changes
randytqwjp c076cc4
minor changes
randytqwjp 8802f3b
fix
randytqwjp d634c79
fix
randytqwjp 3982122
add alpha description to new metrics
randytqwjp ad261c7
remove net hpa metrics
randytqwjp 220d434
fix net change in request
randytqwjp File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -46,25 +46,25 @@ var ( | |
Help: "memory request (byte) that tortoises actually applys", | ||
}, []string{"tortoise_name", "namespace", "container_name", "controller_name", "controller_kind"}) | ||
|
||
DecreaseApplyCounter = prometheus.NewCounterVec(prometheus.CounterOpts{ | ||
Name: "decrease_apply_counter", | ||
Help: "counter for number of resource decreases applied by tortoise", | ||
}, []string{"tortoise_name", "namespace"}) | ||
|
||
IncreaseApplyCounter = prometheus.NewCounterVec(prometheus.CounterOpts{ | ||
Name: "increase_apply_counter", | ||
Help: "counter for number of resource increases applied by tortoise", | ||
}, []string{"tortoise_name", "namespace"}) | ||
|
||
NetHPAMinReplicas = prometheus.NewGaugeVec(prometheus.GaugeOpts{ | ||
Name: "net_hpa_minreplicas", | ||
Help: "net hpa minReplicas that tortoises actually applys to hpa", | ||
}, []string{"tortoise_name", "namespace", "hpa_name", "kube_deployment"}) | ||
|
||
NetHPAMaxReplicas = prometheus.NewGaugeVec(prometheus.GaugeOpts{ | ||
Name: "net_hpa_maxreplicas", | ||
Help: "net hpa maxReplicas that tortoises actually applys to hpa", | ||
}, []string{"tortoise_name", "namespace", "hpa_name", "kube_deployment"}) | ||
NetHPAMinReplicasCPUCores = prometheus.NewGaugeVec(prometheus.GaugeOpts{ | ||
Name: "net_hpa_minreplicas_cpu_cores", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does "net" mean? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. overall change i.e. from 5 cores to 3 cores the change is -2 cores |
||
Help: "net cpu cores changed by minReplicas that tortoises actually applys to hpa", | ||
}, []string{"tortoise_name", "namespace", "hpa_name"}) | ||
|
||
NetHPAMinReplicasMemory = prometheus.NewGaugeVec(prometheus.GaugeOpts{ | ||
Name: "net_hpa_minreplicas_memory", | ||
Help: "net memory changed by minReplicas that tortoises actually applys to hpa", | ||
}, []string{"tortoise_name", "namespace", "hpa_name"}) | ||
|
||
NetHPAMaxReplicasCPUCores = prometheus.NewGaugeVec(prometheus.GaugeOpts{ | ||
Name: "net_hpa_maxreplicas_cpu_cores", | ||
Help: "net cpu cores changed by maxReplicas that tortoises actually applys to hpa", | ||
}, []string{"tortoise_name", "namespace", "hpa_name"}) | ||
|
||
NetHPAMaxReplicasMemory = prometheus.NewGaugeVec(prometheus.GaugeOpts{ | ||
Name: "net_hpa_maxreplicas_memory", | ||
Help: "net memory changed by maxReplicas that tortoises actually applys to hpa", | ||
}, []string{"tortoise_name", "namespace", "hpa_name"}) | ||
|
||
NetCPURequest = prometheus.NewGaugeVec(prometheus.GaugeOpts{ | ||
Name: "net_cpu_request", | ||
|
@@ -117,10 +117,10 @@ func init() { | |
AppliedHPAMinReplicas, | ||
AppliedCPURequest, | ||
AppliedMemoryRequest, | ||
IncreaseApplyCounter, | ||
DecreaseApplyCounter, | ||
NetHPAMaxReplicas, | ||
NetHPAMinReplicas, | ||
NetHPAMinReplicasCPUCores, | ||
NetHPAMinReplicasMemory, | ||
NetHPAMaxReplicasCPUCores, | ||
NetHPAMaxReplicasMemory, | ||
NetCPURequest, | ||
NetMemoryRequest, | ||
ProposedHPATargetUtilization, | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate about the use case a bit more? I mean, how would those metrics benefit you? because, for example,
net_hpa_minreplicas_cpu_cores
shows the difference of CPU allocation with an assumption that HPA always keeps the replicas at minReplicas, which occasionally happens. For another example,net_hpa_maxreplicas_cpu_cores
shows the difference of CPU allocation _with an assumption that HPA always keeps the replicas at maxReplicas, which should never happens.I'm not sure how worthwhile those values would be.
In the first place though, why are you trying to measure them within tortoise? Why not just directly checking the allocated CPU or memory on each service that adopts tortoise if you just want to see the cost reduction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I do acknowledge the first point on the assumptions as this was something I brought up as well. However, it is difficult to calculate accurate cost savings from tortoise as compared to a service purely using HPA. Directly checking allocated CPU or Memory does not tell whether tortoise is actually helping to reduce costs or not. Therefore we need to show cost savings based on decisions made by tortoise such as changing max/min replicas. I do agree its not an accurate amount especially on the max replica side but with regards to min replica, it should only decrease when there is significant underutilization, in which case i do think it can be used for calculating cost savings by tortoise. Do you think we should remove net max replica changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, how would
net_hpa_minreplicas_cpu_cores
help then? Again, it's a super rare situation that HPA always keeps the replicas at minReplicas, and hence if you see -2 cores innet_hpa_minreplicas_cpu_cores
, in most cases, that doesn't mean tortoise makes the cost saving of 2 cores. I'm not sure how this "2 cores" help you understand the cost saving.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this "net" value is simply
old value - new value
, right? Then, except very first reconciliation, old value is also the value calculated from tortoise. It's not the value that the service owner used within their HPA before adopting Tortoise.Another thing, tortoise dynamically changes the minReplicas based on the time.
https://github.com/mercari/tortoise/blob/main/docs/horizontal.md#minreplicas
So, the graph of
net_hpa_minreplicas_cpu_cores
would just show increasing values during the time towards the peak time, and decreasing values during the time towards the off-peak time. Tortoise's principal is not just putting as low as possible value on minReplicas, but is putting the value to be a safe guard as this section describes.How would those values benefit you for the cost saving calculation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh my bad i thought that recommendation algo also made changes to hpa min/max replica based on utilization but looking at the code it seems it does not.. then it doesnt make sense to have cost saving metrics for min/max replica since its more about reliability it seems.. i will revert the changes on min/max replicas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was somehow under the impression that if utilization was low and min replica was at maybe 10, tortoise would decrease min replica then but thats not the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could be right depending on the scenario.
If last week's same day's same time's replica is 20 and the utilization is low today, then tortoise keeps minReplica at 10 (1/2 * lastweek) because most likely something weird is happening now (e.g., the upstream service like gateway is down etc) and it might be dangerous to reduce minReplicas.
Although let's say, then, next week, it's again 10 with a low utilization. It means probably this is a new trend that this service gets smaller traffic now (e.g., one upstream service stopped calling this service, Mercari got unpopular somehow etc).
Tortoise checks the last week's value and it's 10. So, it changes minReplicas to 5 (again 1/2 * lastweek).
So, that's how tortoise deals with the scenario like too high minReplicas makes the service low utilized.