-
Notifications
You must be signed in to change notification settings - Fork 1k
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Could not import SGMV kernel from Punica, falling back to loop. #2465
Comments
Hi @ksajan 👋 Thanks for filing the issue. I think the problem is that you're running on a CPU and the falcon-7b in TGI is only supported with kernels that require a GPU. If you want to run TGI locally on cpu to test I'd recommend choosing a smaller model that doesn't rely on special kernel. Or if you're requirements are to use something like Let me know if I can help in any other way 🙌 |
@ErikKaum I tried running this |
Yeah so the llama.cpp version probably uses different kernels that don't require GPUs. When you build this for a gpu did you use: I'd nonetheless recommend using the docker image to avoid building from source, usually a lot more hassle free 👍 |
System Info
text-generation-launcher --env:
No GPU using CPU version.
Information
Tasks
Reproduction
Could not import SGMV kernel from Punica, falling back to loop.
Expected behavior
It should download the model and serve it without any error
The text was updated successfully, but these errors were encountered: