Skip to content

Feature Request: vLLM Backend #130

@dxvsh

Description

@dxvsh

Are there any plans for supporting vLLM as an additional inference engine apart from llama.cpp?

Llama.cpp is a great option for tinkering and playing around with language models but vLLM is a much better alternative for production workloads and it performs much better on multi-gpu setups thanks to vLLM's support for Tensor Parallelism. It would be great to see it included as an additional backend in model-runner.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions