feat: support non-cuda devices for text and vision models #233

dvrogozh · 2024-12-03T22:05:56Z

This commit adds support of non-cuda pytorch backend devices to text and vision models. Commit extends existing tests to run for the externally specified device (cuda is a default). Commit verified on Llama3.2-3B-Instruct and Llama3.2-11B-Vision-Instruct models for:

cuda device type on NVidia A10 GPU (for no regressions)
cpu device type
xpu device type on Intel Data Center Max Series GPU (PVC)

Note that this commit requires a fix on pytorch side for gloo torch distributed backend to restore TLS on gloo working threads. This change was merged on pytorch side and should make it to pytorch 2.6.

This PR supersedes #165 from @anordin95.

Requires: pytorch/pytorch#142184

Requires: meta-llama/llama-models#233 Signed-off-by: Dmitry Rogozhkin <[email protected]>

dvrogozh · 2024-12-06T21:53:57Z

Updated PR with the changes needed to make vision models working (see 2nd commit). For the test executed the following by commenting out skip conditions:

llama-models/models/llama3/tests/api/test_generation.py

Lines 79 to 81 in 804a64f

    
           @unittest.skip("Disabling vision model test") 
        
           @pytest.mark.skip(reason="Disabling vision model test") 
        
           def test_run_generation(self):

ashwinb · 2025-01-14T19:50:59Z

Thanks @dvrogozh -- could you add a test plan here which shows that the example scripts we have work correctly for both CUDA and non-CUDA environments?

This commit adds support of non-cuda pytorch backend devices to text models. Commit extends existing test to run for the externally specified device (cuda is a default). Commit verified on Llama3.2-3B-Instruct model for: * "cuda" device type on NVidia A10 GPU * "cpu" device type * "xpu" device type on Intel Data Center Max Series GPU (PVC) Co-authored-by: anordin95 <[email protected]> Signed-off-by: Dmitry Rogozhkin <[email protected]>

This commit adds support of non-cuda pytorch backend devices to vision models. Commit verified on Llama3.2-11B-Vision-Instruct model for: * "cuda" device type on NVidia A10 GPU * "cpu" device type * "xpu" device type on Intel Data Center Max Series GPU (PVC) Note that this commit requires a fix on pytorch side for gloo torch distributed backend to restore TLS on gloo working threads. Requires: pytorch/pytorch#142184 Signed-off-by: Dmitry Rogozhkin <[email protected]>

This change modifies a test for the reference inference in a way that on the cpu inference is always tested and on-device is tested if device is available (currently checking for cuda and xpu in that order) or if user explicitly specified DEVICE to test via environment variable. Signed-off-by: Dmitry Rogozhkin <[email protected]>

dvrogozh · 2025-01-14T23:01:36Z

@ashwinb : I have modified existing tests to make them cover both CUDA and non-CUDA environments. In the last commit I further extend a test to run 2 cases: 1) run inference on cpu, 2) run inference on an acceleration device. Device will be automatically detected and used, cuda or xpu in that order. Additionally it's possible to specify device via DEVICE environment variables. Tests can be executed as follows:

python -m pytest models/llama3/tests/api/test_generation.py

Note that I covered both text and visual models. However, at the moment visual model tests are being skipped:

llama-models/models/llama3/tests/api/test_generation.py

Lines 76 to 77 in b333524

    
           @unittest.skip("Disabling vision model test") 
        
           @pytest.mark.skip(reason="Disabling vision model test")

That's effective starting from 6ad6fd6 commit. If skips to be removed, I suggest to do that in a separate PR.

dvrogozh · 2025-01-22T17:59:41Z

@ashwinb, can you, please, help to review and comment?

dvrogozh requested review from ashwinb, yanxi0830, hardikjshah, dltn and raghotham as code owners December 3, 2024 22:05

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 3, 2024

dvrogozh mentioned this pull request Dec 3, 2024

Add optional arg to specify device for Transformer model. #165

Closed

dvrogozh added a commit to dvrogozh/llama-stack that referenced this pull request Dec 3, 2024

feat: enable xpu support for meta-reference stack

4456af7

Requires: meta-llama/llama-models#233 Signed-off-by: Dmitry Rogozhkin <[email protected]>

dvrogozh force-pushed the devices branch from d8a885d to 563e2a1 Compare December 6, 2024 21:41

dvrogozh changed the title ~~feat: support non-cuda devices for text models~~ feat: support non-cuda devices for text and vision models Dec 6, 2024

dvrogozh force-pushed the devices branch from 563e2a1 to 974f601 Compare January 14, 2025 00:33

dvrogozh and others added 3 commits January 14, 2025 22:34

dvrogozh force-pushed the devices branch from 974f601 to 0ffa2b5 Compare January 14, 2025 22:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support non-cuda devices for text and vision models #233

feat: support non-cuda devices for text and vision models #233

dvrogozh commented Dec 3, 2024 •

edited

Loading

dvrogozh commented Dec 6, 2024

ashwinb commented Jan 14, 2025

dvrogozh commented Jan 14, 2025

dvrogozh commented Jan 22, 2025

feat: support non-cuda devices for text and vision models #233

Are you sure you want to change the base?

feat: support non-cuda devices for text and vision models #233

Conversation

dvrogozh commented Dec 3, 2024 • edited Loading

dvrogozh commented Dec 6, 2024

ashwinb commented Jan 14, 2025

dvrogozh commented Jan 14, 2025

dvrogozh commented Jan 22, 2025

dvrogozh commented Dec 3, 2024 •

edited

Loading