-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve docs on local inference endpoint and easier to find model name parameter #121
Comments
Hi @GregorBiswanger, we have an local HTTP endpoint in OpenAI contract, please reference to this documentation. https://github.com/microsoft/vscode-ai-toolkit/tree/main/archive#-use-the-rest-api-in-your-application |
Thank you very much for the info! This needs to be made very clear in the extension's UI. It's a nightmare when trying to find out the actual model name for the API. To do this, you have to hover the mouse pointer over a locally downloaded model and then use the screenshot tool to abstract the text. The developer experience is really not good here. Please change that. |
Ok, my bad, the doc: https://github.com/microsoft/vscode-ai-toolkit/tree/main/archive#-use-the-rest-api-in-your-application is wrong. It should be:
instead of
|
I am running into an issue hitting Rest Api. I am able to use it no issue in Playground body.json {
"model": "llama2:latest",
"messages": [
{
"role": "user",
"content": "what is the golden ratio?"
}
],
"temperature": 0.7,
"top_p": 1,
"top_k": 10,
"max_tokens": 100,
"stream": true
} Curl call curl -v POST http://127.0.0.1:5272/v1/chat/completions -H "Content-Type: application/json" -d @body.json Error * Could not resolve host: POST
* shutting down connection #0
curl: (6) Could not resolve host: POST
* Trying 127.0.0.1:5272...
* Connected to 127.0.0.1 (127.0.0.1) port 5272
* using HTTP/1.x
> POST /v1/chat/completions HTTP/1.1
> Host: 127.0.0.1:5272
> User-Agent: curl/8.10.1
> Accept: */*
> Content-Type: application/json
> Content-Length: 243
>
* upload completely sent off: 243 bytes
< HTTP/1.1 500 Internal Server Error
< Content-Type: application/problem+json
< Date: Wed, 22 Jan 2025 00:36:56 GMT
< Server: Kestrel
< Transfer-Encoding: chunked
<
{"type":"https://tools.ietf.org/html/rfc9110#section-15.6.1","title":"Failed to handle openAI completion","status":500,"detail":"No OpenAIService provider found for modelName:llama2:latest"}* Connection #1 to host 127.0.0.1 left intact |
First off, a huge thank you for creating such an amazing extension! Your work makes developing with AI models so much easier.
I have a feature request:
Would it be possible to trigger loaded models via HTTP?
This would be incredibly useful for local development, as it would allow seamless integration with custom scripts or applications without having to adjust the entire workflow.
Thanks so much for considering this, and keep up the great work!
The text was updated successfully, but these errors were encountered: