-
Does SGLNG support the combination of ray distributed inference? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 5 replies
-
Currently, SGLang supports multi-GPU on a single machine and also supports multi-GPU on multiple machines. What additional benefits can integrating Ray bring? There are currently no plans for that. |
Beta Was this translation helpful? Give feedback.
-
In the example you provided, the size of the weights for Qwen 2 7B is such that a single machine with a single GPU is sufficient, there is no need for multiple machines or GPUs. I understand that using multiple machines or GPUs is typically done in scenarios where a single machine with multiple GPUs, such as Llama 3.1 405B, cannot accommodate the workload. |
Beta Was this translation helpful? Give feedback.
Currently, SGLang supports multi-GPU on a single machine and also supports multi-GPU on multiple machines. What additional benefits can integrating Ray bring? There are currently no plans for that.