-
Notifications
You must be signed in to change notification settings - Fork 7k
[Core][GPU fraction][1/n] Unify node feasibility and availability checking #59278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[Core][GPU fraction][1/n] Unify node feasibility and availability checking #59278
Conversation
Signed-off-by: yicheng <[email protected]>
| // Check if the node has required labels before scoring on the resources. | ||
| const auto &label_selector = required_resources.GetLabelSelector(); | ||
| if (!node_resources.HasRequiredLabels(label_selector)) { | ||
| if (!node_resources.IsAvailable(required_resources)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IsAvailable already checks the required labels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to unify node feasibility and availability checking in the scheduler. The changes in scorer.cc centralize the availability check by using node_resources.IsAvailable() and removing redundant checks within the LeastResourceScorer. This is a good refactoring that simplifies the code. My feedback includes a suggestion to add an assertion to enforce a critical precondition, which will make the code more robust.
| double LeastResourceScorer::Calculate(const FixedPoint &requested, | ||
| const FixedPoint &available) { | ||
| RAY_CHECK(available >= 0) << "Available resource " << available.Double() | ||
| << " should be nonnegative."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The removal of the requested > available check relies on node_resources.IsAvailable() being correct. To make the code more robust against potential bugs in the availability check, it's safer to add a RAY_CHECK(requested <= available). This enforces the assumption that Calculate is only called for feasible requests and will help catch issues early.
| << " should be nonnegative."; | |
| << " should be nonnegative."; | |
| RAY_CHECK(requested <= available) | |
| << "Requested resource " << requested.Double() | |
| << " should not be greater than available resource " << available.Double() | |
| << ". This indicates a bug in the feasibility check."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need. From a design perspective, the logic for determining whether a node is schedulable and the logic for choosing the best node should be completely separate and follow a strict order.
|
@ZacAttack @Sparks0219 PTAL too |
| return node_score; | ||
| } | ||
|
|
||
| // This function assumes the resource request has already passed the availability check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can you add the this also to the comment of the function in the .h file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added, Thank you!
| return -1.; | ||
| } | ||
| node_score += score; | ||
| node_score += Calculate(request_resource, node_available_resource); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding of the code:
- Previously: we first checks the labels, then potentially update the available resources, the checks whether there are available resources in the
Calculatefunction - Now: we first checks both the labels and the the available resources, then potentially update the available resources, the calculate the score assuming the node has enough resources for the request
I might miss something but looks like with the new implementation, there could be case where the node will have available resources before the resource update but not after which will make the new implementation not consistent with the previous implementation, is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about that — I should have mentioned this!
IsAvailable already handles the normal_task_resources subtraction internally:
ray/src/ray/common/scheduling/cluster_resource_data.cc
Lines 98 to 103 in de7ac7d
| if (!this->normal_task_resources.IsEmpty()) { | |
| auto available_resources = this->available; | |
| available_resources -= this->normal_task_resources; | |
| return available_resources >= resource_request.GetResourceSet(); | |
| } | |
| return this->available >= resource_request.GetResourceSet(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I missed that. Thanks for the context!
The current implementation is a bit confusion, as part of the project of improving the resource tracking effort, we should make it cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will look deep into it and draft an follow up pr if needed!
Signed-off-by: yicheng <[email protected]>
MengjinYan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Description
Problem Statement
Currently, the scheduler’s node feasibility and availability checks are inconsistent with the actual resource allocation logic. The scheduler reasons only about aggregated GPU capacity per node, while the allocator(local) enforces constraints based on the per-GPU topology.
For example, consider a node with two GPUs, each with 0.2 GPU remaining. The scheduler observes 0.4 GPU available in total and concludes that a actor requesting 0.4 GPU can be placed on this node. However, the allocator(local) rejects the request because no single GPU has 0.4 GPU available.
what this PR does
The high-level goal of this PR is to make node feasibility and availability checks consistent between the scheduler and the resource allocator.
Although the detailed design is still a work in progress and need big refactor, the first step is to make the scheduler’s node feasibility and availability checks itself consistent and centralized.
Right now, Ray has three scheduling paths:
Tasks and actors essentially share the same scheduling path and use the same node feasibility and availability check function. Placement group scheduling, however, implements its own logic in certain path, even though it is conceptually the same.
Since we may override or extend the node feasibility and availability checks in later PRs, it is better to first ensure that all scheduling paths use a single, shared implementation of this logic.
This PR addresses that problem.
Related issues
Related to #52133 #54729
Additional information
Here I list all the cases that make sure we are relying on the same node feasibility and availability checking func. Later we can just focusing on changing the func and underlying data structure:
Normal task/actor scheduling:
HybridSchedulingPolicy:
ray/src/ray/raylet/scheduling/policy/hybrid_scheduling_policy.cc
Line 41 in 555fab3
ray/src/ray/raylet/scheduling/policy/hybrid_scheduling_policy.cc
Line 137 in 555fab3
SpreadSchedulingPolicy:
ray/src/ray/raylet/scheduling/policy/spread_scheduling_policy.cc
Line 49 in 555fab3
ray/src/ray/raylet/scheduling/policy/spread_scheduling_policy.cc
Line 54 in 555fab3
RandomSchedulingPolicy
ray/src/ray/raylet/scheduling/policy/random_scheduling_policy.cc
Lines 47 to 48 in 456d190
NodeAffinitySchedulingPolicy
ray/src/ray/raylet/scheduling/policy/node_affinity_scheduling_policy.cc
Lines 26 to 30 in 1180868
NodeLabelSchedulingPolicy
ray/src/ray/raylet/scheduling/policy/node_label_scheduling_policy.cc
Line 171 in 1180868
ray/src/ray/raylet/scheduling/policy/node_label_scheduling_policy.cc
Line 186 in 1180868
Placement Group reservation(scheduling bundle):
ray/src/ray/raylet/scheduling/policy/scorer.cc
Line 58 in 1180868
ray/src/ray/raylet/scheduling/policy/scorer.cc
Line 58 in 1180868
ray/src/ray/raylet/scheduling/policy/bundle_scheduling_policy.cc
Line 185 in 1180868
Task/Actor with Placement Group:
ray/src/ray/raylet/scheduling/policy/affinity_with_bundle_scheduling_policy.cc
Lines 25 to 26 in 1180868