-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
planning: Migrate Threads, Messages to Cortex, deprecates Conversation Extension #3904
Comments
According to this: janhq/cortex.cpp#1567 (comment)Problems
See a Jan {
"id": "jan_1729768043",
"object": "thread",
"title": "0.5.8 llama 3.2 1b",
"assistants": [
{
"assistant_id": "jan",
"assistant_name": "Jan",
"tools": [
{
"type": "retrieval",
"enabled": true,
"settings": {
"top_k": 2,
"chunk_size": 1024,
"chunk_overlap": 64,
"retrieval_template": "Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n----------------\nCONTEXT: {CONTEXT}\n----------------\nQUESTION: {QUESTION}\n----------------\nHelpful Answer:"
}
}
],
"model": {
"id": "llama3.2-1b-instruct",
"settings": {
"engine": "llama-cpp",
"ctx_len": 3072,
"ngl": 100,
"prompt_template": "<|start_header_id|>system<|end_header_id|>\n\n{system_message}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
"text_model": false
},
"parameters": {
"engine": "llama-cpp",
"frequency_penalty": 0,
"max_tokens": 3072,
"presence_penalty": 0,
"stop": [
"<|eot_id|>"
],
"stream": true,
"temperature": 0.699999988079071,
"top_p": 0.949999988079071
},
"engine": "llama-cpp"
},
"instructions": ""
}
],
"created": 1729768043312,
"updated": 1730195853233,
"metadata": {
"lastMessage": "Hello!"
}
} See OpenAI Assistant and Thread: {
"id": "asst_abc123",
"object": "assistant",
"created_at": 1698984975,
"name": "Math Tutor",
"description": null,
"model": "gpt-4o",
"instructions": "You are a personal math tutor. When asked a question, write and run Python code to answer the question.",
"tools": [
{
"type": "code_interpreter"
}
],
"metadata": {},
"top_p": 1.0,
"temperature": 1.0,
"response_format": "auto"
}
So should we:
I think 2 is preferred since we could take advantage of existing test suites and client SDKs. Otherwise, we would eventually do another migration to scale to Decouple
|
Can you elaborate a bit more about:
|
ah @dan-homebrew I just mean
|
Got it. Can you proceed to make the recommendations for how we can break down the Assistants, Threads/Messages, and Models endpoints (and the related data structures).
I think it's better we bite the bullet and move to the correct data structures. |
Update: Scoped down for frontend:
|
Attached: (Jan Frontend and QA) |
Goal
Tasklist
/threads
,/messages
cortex.cpp#1567The text was updated successfully, but these errors were encountered: