-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Scanning GPU allocation map #2273
base: main
Are you sure you want to change the base?
feat: Scanning GPU allocation map #2273
Conversation
Your org has enabled the Graphite merge queue for merging into mainAdd the label “flow:merge-queue” to the PR and Graphite will automatically add it to the merge queue when it’s ready to merge. Or use the label “flow:hotfix” to add to the merge queue as a hot fix. You must have a Graphite account and log in to Graphite in order to use the merge queue. Sign up using this link. |
This stack of pull requests is managed by Graphite. Learn more about stacking. |
e86422e
to
a1f7c71
Compare
689f599
to
d31bca6
Compare
1eda694
to
2e9a2d6
Compare
f3056c8
to
0c45322
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
�lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
�lgtm
Co-authored-by: octodog <[email protected]>
cc44683
to
4704dd6
Compare
Resolves #3327. (https://github.com/lablup/giftbox/issues/638) (BA-428) (GF-67).
Implement an API that allows the administrator to check how fGPU is allocated among agents through GPU alloc map (GPU allocation states per GPU device).
How it work
The GPU allocation is calculated by reading the resource.txt file in the scratch directory per kernel and summing up the allocation information in
KernelResourceSpec
.Usage example
Note
This describes the same issue addressed in issue #638.
Tested using mock-accelerator.
Here is a simple example with which we can test this PR.
When I specify below two mock GPU devices in
mock-accelerator.toml
, I have 2 fGPUs in total.And after creating session like below command,
I can query the
gpu_alloc_map
as json format using the following query statement.And we can see two mock GPU devices have been allocated 0.6 and 0.8 fGPU respectively.
In the first request, 0.2 fGPU was allocated to the second GPU.
Since there was no device available to allocate 1.2 fGPU in the second request, it can be seen that 0.6 fGPU was evenly distributed and allocated to both GPU devices.
Checklist: (if applicable)