-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-36863][autoscaler] Use the maximum parallelism in the past scale-down.interval window when scaling down #922
base: main
Are you sure you want to change the base?
Conversation
345af37
to
db578c8
Compare
while (!recommendedParallelisms.isEmpty() | ||
&& recommendedParallelisms.peekLast().getParallelism() <= parallelism) { | ||
recommendedParallelisms.pollLast(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good optimization 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gyfora for the comment in advance, I'm still adding more tests and comments for this PR.
I have some other works to do this week, so I expect this PR will be ready next week.
ec694b0
to
4166c97
Compare
7da57c6
to
20d8552
Compare
…le-down.interval window when scaling down
20d8552
to
49478e4
Compare
List.of( | ||
790, 200, 200, 200, 200, 200, 790, 200, 200, 200, 200, 200, 790, 200, 200, | ||
200, 200, 200, 790, 200, 200, 200, 200, 200, 790, 200, 200, 200, 200, 200, | ||
790), | ||
List.of(790, 200, 790, 200, 790, 200, 790, 200, 790, 200, 790, 200, 790, 200, 790)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the RecommendedParallelism
class does not include outsideUtilizationBound
, this test will be failed for the last 2 inputs.
The scale down interval is 60 minutes, and metric window is 10 minutes, so the recommended parallelism will be 790 for the last 2 inputs. When we check outsideUtilizationBound
, we need to check the outsideUtilizationBound
corresponding to the parallelism of 790.
That's why RecommendedParallelism
class includes outsideUtilizationBound
. If it doesn't include outsideUtilizationBound
, the latest outsideUtilizationBound
may come from parallelism 200
if (parallelismChange.isOutsideUtilizationBound()) { | ||
anyVertexOutsideBound.set(true); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We check the OutsideUtilizationBound
of the final recommended parallelism instead of the OutsideUtilizationBound
of latest recommended parallelism.
If the Utilization of all tasks is within range, we can skip scaling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy new year @gyfora @mxm , this PR is ready for review. Please help review in your free time, thank you in advance~
I introduced the DelayedScaleDownEndToEndTest
in the last commit to test delayed scale down in detail, so the code changes are more than 1 K lines, but about 80% of code changes are tests.
What is the purpose of the change
[FLINK-36863][autoscaler] Use the maximum parallelism in the past scale-down.interval window when scaling down instead of in the entire past
See more details from FLINK-36863.
Brief change log
VertexDelayedScaleDownInfo maintain all recommended parallelisms at each time within the past scale-down.interval window period.
Note1 : It is a scenario that calculates the max value within a sliding window.
Note2: Check the
OutsideUtilizationBound
of the final recommended parallelism instead of theOutsideUtilizationBound
of latest recommended parallelismSee more details from these 2 comments: #922 (comment) and #922 (comment)
Verifying this change
DelayedScaleDownTest
ScalingExecutorTest#testUtilizationBoundaries
based on the latest logicDelayedScaleDownEndToEndTest
to test delayed scale down in detailDoes this pull request potentially affect one of the following parts:
CustomResourceDescriptors
: noDocumentation