Skip to content
Discussion options

You must be logged in to vote

Please review this doc. vLLM is OpenAI compliant, meaning you can just use the openai python library and use a different base_url for whatever your inference server is.

class Testing(BaseModel):
    """
    A class representing a testing schema.
    """
    name: str
    age: int

openai_client = openai.OpenAI(
    base_url="http://0.0.0.0:1234/v1",
    api_key="dopeness"
)

# Make a request to the local LM Studio server
response = openai_client.beta.chat.completions.parse(
    model="hugging-quants/Llama-3.2-1B-Instruct-Q8_0-GGUF",
    messages=[
        {"role": "system", "content": "You are like so good at whatever you do."},
        {"role": "user", "content": "My name is Cameron and …

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by cpfiffer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants