-
Notifications
You must be signed in to change notification settings - Fork 939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support non-cuda devices for text and vision models #233
base: main
Are you sure you want to change the base?
Conversation
Requires: meta-llama/llama-models#233 Signed-off-by: Dmitry Rogozhkin <[email protected]>
Updated PR with the changes needed to make vision models working (see 2nd commit). For the test executed the following by commenting out skip conditions: llama-models/models/llama3/tests/api/test_generation.py Lines 79 to 81 in 804a64f
|
Thanks @dvrogozh -- could you add a test plan here which shows that the example scripts we have work correctly for both CUDA and non-CUDA environments? |
This commit adds support of non-cuda pytorch backend devices to text models. Commit extends existing test to run for the externally specified device (cuda is a default). Commit verified on Llama3.2-3B-Instruct model for: * "cuda" device type on NVidia A10 GPU * "cpu" device type * "xpu" device type on Intel Data Center Max Series GPU (PVC) Co-authored-by: anordin95 <[email protected]> Signed-off-by: Dmitry Rogozhkin <[email protected]>
This commit adds support of non-cuda pytorch backend devices to vision models. Commit verified on Llama3.2-11B-Vision-Instruct model for: * "cuda" device type on NVidia A10 GPU * "cpu" device type * "xpu" device type on Intel Data Center Max Series GPU (PVC) Note that this commit requires a fix on pytorch side for gloo torch distributed backend to restore TLS on gloo working threads. Requires: pytorch/pytorch#142184 Signed-off-by: Dmitry Rogozhkin <[email protected]>
This change modifies a test for the reference inference in a way that on the cpu inference is always tested and on-device is tested if device is available (currently checking for cuda and xpu in that order) or if user explicitly specified DEVICE to test via environment variable. Signed-off-by: Dmitry Rogozhkin <[email protected]>
@ashwinb : I have modified existing tests to make them cover both CUDA and non-CUDA environments. In the last commit I further extend a test to run 2 cases: 1) run inference on cpu, 2) run inference on an acceleration device. Device will be automatically detected and used, cuda or xpu in that order. Additionally it's possible to specify device via
Note that I covered both text and visual models. However, at the moment visual model tests are being skipped: llama-models/models/llama3/tests/api/test_generation.py Lines 76 to 77 in b333524
That's effective starting from 6ad6fd6 commit. If skips to be removed, I suggest to do that in a separate PR. |
@ashwinb, can you, please, help to review and comment? |
This commit adds support of non-cuda pytorch backend devices to text and vision models. Commit extends existing tests to run for the externally specified device (
cuda
is a default). Commit verified on Llama3.2-3B-Instruct and Llama3.2-11B-Vision-Instruct models for:cuda
device type on NVidia A10 GPU (for no regressions)cpu
device typexpu
device type on Intel Data Center Max Series GPU (PVC)Note that this commit requires a fix on pytorch side for gloo torch distributed backend to restore TLS on gloo working threads. This change was merged on pytorch side and should make it to pytorch 2.6.
This PR supersedes #165 from @anordin95.
Requires: pytorch/pytorch#142184