Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve docs on local inference endpoint and easier to find model name parameter #121

Open
GregorBiswanger opened this issue Nov 30, 2024 · 5 comments
Assignees
Labels
feature request The issue is a feature request needs attention The issue needs contributor's attention

Comments

@GregorBiswanger
Copy link

First off, a huge thank you for creating such an amazing extension! Your work makes developing with AI models so much easier.

I have a feature request:
Would it be possible to trigger loaded models via HTTP?
This would be incredibly useful for local development, as it would allow seamless integration with custom scripts or applications without having to adjust the entire workflow.

Thanks so much for considering this, and keep up the great work!

@a1exwang a1exwang added the feature request The issue is a feature request label Dec 4, 2024
@a1exwang a1exwang self-assigned this Dec 10, 2024
@a1exwang
Copy link
Collaborator

Hi @GregorBiswanger, we have an local HTTP endpoint in OpenAI contract, please reference to this documentation. https://github.com/microsoft/vscode-ai-toolkit/tree/main/archive#-use-the-rest-api-in-your-application

@a1exwang a1exwang added the needs more info Need user to provide more info label Dec 10, 2024
@GregorBiswanger
Copy link
Author

Thank you very much for the info!

This needs to be made very clear in the extension's UI. It's a nightmare when trying to find out the actual model name for the API. To do this, you have to hover the mouse pointer over a locally downloaded model and then use the screenshot tool to abstract the text. The developer experience is really not good here. Please change that.

@microsoft-github-policy-service microsoft-github-policy-service bot added needs attention The issue needs contributor's attention and removed needs more info Need user to provide more info labels Dec 10, 2024
@a1exwang a1exwang changed the title Feature Request: HTTP-API for Loaded Models Improve docs on local inference endpoint and easier to find model name parameter Dec 11, 2024
@a1exwang a1exwang removed the needs attention The issue needs contributor's attention label Dec 11, 2024
@davrous
Copy link

davrous commented Dec 24, 2024

Hello! Adding +1 to this issue. I'd love to see the URL exposed in a super visible way so I could only copy/paste it in my project calling the model. Also, even looking at the doc, and calling the URL with curl, I've got an error 500. It's not super clear what you need to do to expose the model on HTTP.Image

@davrous
Copy link

davrous commented Dec 24, 2024

Ok, my bad, the doc: https://github.com/microsoft/vscode-ai-toolkit/tree/main/archive#-use-the-rest-api-in-your-application is wrong. It should be:

curl -v POST http://127.0.0.1:5272/v1/chat/completions -H "Content-Type: application/json" -d @body.json

instead of

curl -v POST http://127.0.0.1:5272/v1/chat/completions -H 'Content-Type: application/json' -d @body.json

@isaacrlevin
Copy link

I am running into an issue hitting Rest Api. I am able to use it no issue in Playground

body.json

{
    "model": "llama2:latest",
    "messages": [
        {
            "role": "user",
            "content": "what is the golden ratio?"
        }
    ],
    "temperature": 0.7,
    "top_p": 1,
    "top_k": 10,
    "max_tokens": 100,
    "stream": true
}

Curl call

curl -v POST http://127.0.0.1:5272/v1/chat/completions -H "Content-Type: application/json" -d @body.json

Error

* Could not resolve host: POST
* shutting down connection #0
curl: (6) Could not resolve host: POST
*   Trying 127.0.0.1:5272...
* Connected to 127.0.0.1 (127.0.0.1) port 5272
* using HTTP/1.x
> POST /v1/chat/completions HTTP/1.1
> Host: 127.0.0.1:5272
> User-Agent: curl/8.10.1
> Accept: */*
> Content-Type: application/json
> Content-Length: 243
>
* upload completely sent off: 243 bytes
< HTTP/1.1 500 Internal Server Error
< Content-Type: application/problem+json
< Date: Wed, 22 Jan 2025 00:36:56 GMT
< Server: Kestrel
< Transfer-Encoding: chunked
<
{"type":"https://tools.ietf.org/html/rfc9110#section-15.6.1","title":"Failed to handle openAI completion","status":500,"detail":"No OpenAIService provider found for modelName:llama2:latest"}* Connection #1 to host 127.0.0.1 left intact

@microsoft-github-policy-service microsoft-github-policy-service bot added the needs attention The issue needs contributor's attention label Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request The issue is a feature request needs attention The issue needs contributor's attention
Projects
None yet
Development

No branches or pull requests

4 participants