-
Notifications
You must be signed in to change notification settings - Fork 22
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Are there any plans for supporting vLLM as an additional inference engine apart from llama.cpp?
Llama.cpp is a great option for tinkering and playing around with language models but vLLM is a much better alternative for production workloads and it performs much better on multi-gpu setups thanks to vLLM's support for Tensor Parallelism. It would be great to see it included as an additional backend in model-runner.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request