Improve docs on local inference endpoint and easier to find model name parameter #121

GregorBiswanger · 2024-11-30T23:48:47Z

First off, a huge thank you for creating such an amazing extension! Your work makes developing with AI models so much easier.

I have a feature request:
Would it be possible to trigger loaded models via HTTP?
This would be incredibly useful for local development, as it would allow seamless integration with custom scripts or applications without having to adjust the entire workflow.

Thanks so much for considering this, and keep up the great work!

a1exwang · 2024-12-10T07:11:14Z

Hi @GregorBiswanger, we have an local HTTP endpoint in OpenAI contract, please reference to this documentation. https://github.com/microsoft/vscode-ai-toolkit/tree/main/archive#-use-the-rest-api-in-your-application

GregorBiswanger · 2024-12-10T13:24:22Z

Thank you very much for the info!

This needs to be made very clear in the extension's UI. It's a nightmare when trying to find out the actual model name for the API. To do this, you have to hover the mouse pointer over a locally downloaded model and then use the screenshot tool to abstract the text. The developer experience is really not good here. Please change that.

davrous · 2024-12-24T15:56:59Z

Hello! Adding +1 to this issue. I'd love to see the URL exposed in a super visible way so I could only copy/paste it in my project calling the model. Also, even looking at the doc, and calling the URL with curl, I've got an error 500. It's not super clear what you need to do to expose the model on HTTP.

davrous · 2024-12-24T16:03:35Z

Ok, my bad, the doc: https://github.com/microsoft/vscode-ai-toolkit/tree/main/archive#-use-the-rest-api-in-your-application is wrong. It should be:

curl -v POST http://127.0.0.1:5272/v1/chat/completions -H "Content-Type: application/json" -d @body.json

instead of

curl -v POST http://127.0.0.1:5272/v1/chat/completions -H 'Content-Type: application/json' -d @body.json

isaacrlevin · 2025-01-22T00:39:25Z

I am running into an issue hitting Rest Api. I am able to use it no issue in Playground

body.json

{
    "model": "llama2:latest",
    "messages": [
        {
            "role": "user",
            "content": "what is the golden ratio?"
        }
    ],
    "temperature": 0.7,
    "top_p": 1,
    "top_k": 10,
    "max_tokens": 100,
    "stream": true
}

Curl call

curl -v POST http://127.0.0.1:5272/v1/chat/completions -H "Content-Type: application/json" -d @body.json

Error

* Could not resolve host: POST
* shutting down connection #0
curl: (6) Could not resolve host: POST
*   Trying 127.0.0.1:5272...
* Connected to 127.0.0.1 (127.0.0.1) port 5272
* using HTTP/1.x
> POST /v1/chat/completions HTTP/1.1
> Host: 127.0.0.1:5272
> User-Agent: curl/8.10.1
> Accept: */*
> Content-Type: application/json
> Content-Length: 243
>
* upload completely sent off: 243 bytes
< HTTP/1.1 500 Internal Server Error
< Content-Type: application/problem+json
< Date: Wed, 22 Jan 2025 00:36:56 GMT
< Server: Kestrel
< Transfer-Encoding: chunked
<
{"type":"https://tools.ietf.org/html/rfc9110#section-15.6.1","title":"Failed to handle openAI completion","status":500,"detail":"No OpenAIService provider found for modelName:llama2:latest"}* Connection #1 to host 127.0.0.1 left intact

a1exwang added the feature request The issue is a feature request label Dec 4, 2024

a1exwang self-assigned this Dec 10, 2024

a1exwang added the needs more info Need user to provide more info label Dec 10, 2024

microsoft-github-policy-service bot added needs attention The issue needs contributor's attention and removed needs more info Need user to provide more info labels Dec 10, 2024

a1exwang changed the title ~~Feature Request: HTTP-API for Loaded Models~~ Improve docs on local inference endpoint and easier to find model name parameter Dec 11, 2024

a1exwang removed the needs attention The issue needs contributor's attention label Dec 11, 2024

microsoft-github-policy-service bot added the needs attention The issue needs contributor's attention label Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve docs on local inference endpoint and easier to find model name parameter #121

Improve docs on local inference endpoint and easier to find model name parameter #121

GregorBiswanger commented Nov 30, 2024

a1exwang commented Dec 10, 2024

GregorBiswanger commented Dec 10, 2024

davrous commented Dec 24, 2024

davrous commented Dec 24, 2024

isaacrlevin commented Jan 22, 2025

Improve docs on local inference endpoint and easier to find model name parameter #121

Improve docs on local inference endpoint and easier to find model name parameter #121

Comments

GregorBiswanger commented Nov 30, 2024

a1exwang commented Dec 10, 2024

GregorBiswanger commented Dec 10, 2024

davrous commented Dec 24, 2024

davrous commented Dec 24, 2024

isaacrlevin commented Jan 22, 2025