You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be great if OpenLLM supported pre-Ampere architecture Cuda devices. In my case, I'm looking at the volta architecture.
The README currently indicates that an Ampere-architecture or newer GPU is required to use the vLLM backend, otherwise you're stuck with the torch backend.
As far as I can tell, this is not a vLLM-specific constraint - vLLM does not require that you use an ampere-arch device.
Motivation
I am trying to run OpenLLM on my Nvidia Tesla v100 (32GB) devices, but I cannot use the vLLM backend, as OpenLLM's vLLM backend does not support the volta architecture.
Other
I would love to help as best as I can, but I can't find any documentation for where this constraint comes from, other than in the README. I've gone through vLLM's docs, and they do not indicate that this is a vLLM constraint.
The text was updated successfully, but these errors were encountered:
Feature request
It would be great if OpenLLM supported pre-Ampere architecture Cuda devices. In my case, I'm looking at the volta architecture.
The README currently indicates that an Ampere-architecture or newer GPU is required to use the vLLM backend, otherwise you're stuck with the torch backend.
As far as I can tell, this is not a vLLM-specific constraint - vLLM does not require that you use an ampere-arch device.
Motivation
I am trying to run OpenLLM on my Nvidia Tesla v100 (32GB) devices, but I cannot use the vLLM backend, as OpenLLM's vLLM backend does not support the volta architecture.
Other
I would love to help as best as I can, but I can't find any documentation for where this constraint comes from, other than in the README. I've gone through vLLM's docs, and they do not indicate that this is a vLLM constraint.
The text was updated successfully, but these errors were encountered: