Feature Request: vLLM Backend

Are there any plans for supporting vLLM as an additional inference engine apart from llama.cpp? 

Llama.cpp is a great option for tinkering and playing around with language models but vLLM is a much better alternative for production workloads and it performs much better on multi-gpu setups thanks to vLLM's support for Tensor Parallelism. It would be great to see it included as an additional backend in model-runner.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: vLLM Backend #130

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: vLLM Backend #130

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions