planning: Migrate Threads, Messages to Cortex, deprecates Conversation Extension #3904

dan-menlo · 2024-10-29T05:53:32Z

Goal

Jan shifts Conversations state to Cortex for management
Deprecate Conversations Extension

Tasklist

The text was updated successfully, but these errors were encountered:

louis-jan · 2024-10-29T16:18:20Z

According to this:

janhq/cortex.cpp#1567 (comment)

Problems

/messages is quite straightforward for now but Jan's /threads are a combination of model preset, assistant parameters, assistant tools and threads.
Also /assistants is not well designed, it defaults to a hard-coded template.

See a Jan thread.json example:

{
  "id": "jan_1729768043",
  "object": "thread",
  "title": "0.5.8 llama 3.2 1b",
  "assistants": [
    {
      "assistant_id": "jan",
      "assistant_name": "Jan",
      "tools": [
        {
          "type": "retrieval",
          "enabled": true,
          "settings": {
            "top_k": 2,
            "chunk_size": 1024,
            "chunk_overlap": 64,
            "retrieval_template": "Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n----------------\nCONTEXT: {CONTEXT}\n----------------\nQUESTION: {QUESTION}\n----------------\nHelpful Answer:"
          }
        }
      ],
      "model": {
        "id": "llama3.2-1b-instruct",
        "settings": {
          "engine": "llama-cpp",
          "ctx_len": 3072,
          "ngl": 100,
          "prompt_template": "<|start_header_id|>system<|end_header_id|>\n\n{system_message}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
          "text_model": false
        },
        "parameters": {
          "engine": "llama-cpp",
          "frequency_penalty": 0,
          "max_tokens": 3072,
          "presence_penalty": 0,
          "stop": [
            "<|eot_id|>"
          ],
          "stream": true,
          "temperature": 0.699999988079071,
          "top_p": 0.949999988079071
        },
        "engine": "llama-cpp"
      },
      "instructions": ""
    }
  ],
  "created": 1729768043312,
  "updated": 1730195853233,
  "metadata": {
    "lastMessage": "Hello!"
  }
}

See OpenAI Assistant and Thread:

{
  "id": "asst_abc123",
  "object": "assistant",
  "created_at": 1698984975,
  "name": "Math Tutor",
  "description": null,
  "model": "gpt-4o",
  "instructions": "You are a personal math tutor. When asked a question, write and run Python code to answer the question.",
  "tools": [
    {
      "type": "code_interpreter"
    }
  ],
  "metadata": {},
  "top_p": 1.0,
  "temperature": 1.0,
  "response_format": "auto"
}

{
  "id": "thread_abc123",
  "object": "thread",
  "created_at": 1699012949,
  "metadata": {},
  "tool_resources": {}
}

So should we:

Introduce a new structure similar to an existing one and scoped by /threads and /messages
Follow a popular schema such as OpenAI that could scale to /assistants

I think 2 is preferred since we could take advantage of existing test suites and client SDKs. Otherwise, we would eventually do another migration to scale to /assistants and double the workload, such as writing tests.

Decouple `threads` & `/models`

Currently, they are coupled and fairly similar to preset, which is not really well-defined. E.g. thread.json defines model settings, which created a side effect where switching between threads would also reload the model. It's an antipattern, and we should find a way to decouple it.

Inference parameters & tools go to /assistants. It's to scale /assistants better where users can have more than one assistant persona (instructions + parameters) instead of hard coding.
Model parameters go to /models where PUT takes effect (now it's used nowhere)
The thread is now fairly thin. Better to scale to /run as well, it is a likely a container that glue components together (assistant, run, file_stores)

There would be many conclusions that affect Jan's UX such as:

Threads are now coupled with model settings, which introduces a bad UX where users get their model restarted every time they switch to a new thread, even with the same model.

Moving model configurations to per-model settings would be beneficial. Those settings have a global affect.
Assistants are clearly defined. Where users can have more than one assistant persona (instructions + parameters).

As a new user to this space, it's quite hard to get thread's parameters and settings. The Assistant Personas (instructions and parameters) and Model Capability Settings (more about hardware explanations) would help onboard users better.

dan-menlo · 2024-10-29T23:39:19Z

As a new user to this space, it's quite hard to get thread's parameters and settings. The Writing Assistant Persona (instructions and parameters) and Model Capability Settings (more about hardware explanations) would help onboard users better.

Can you elaborate a bit more about:

Writing Assistant Persona: is this an Assistant?
Model Capability Settings: is this a FAQ? Or an assistant meant to teach the user how to use Jan Settings?

louis-jan · 2024-10-30T02:51:51Z

ah @dan-homebrew I just mean

Thread's Inference Parameters such as temperature, frequency penalty, presence penalty are quite incomprehensible. Move those to Assistant would make building an assistant persona easier to get.
Modifying Thread's settings parameters, such as context window and ngl, cause a bad UX. Move to per-model settings might help. From there we add more hardware detection information such as the recommended GPU layers load and context length based on their device specs -> Global effect per model, not per thread.

dan-menlo · 2024-10-30T06:59:48Z

ah @dan-homebrew I just mean

Thread's Inference Parameters such as temperature, frequency penalty, presence penalty are quite incomprehensible. Move those to Assistant would make building an assistant persona easier to get.

Modifying Thread's settings parameters, such as context window and ngl, cause a bad UX. Move to per-model settings might help. From there we add more hardware detection information such as the recommended GPU layers load and context length based on their device specs -> Global effect per model, not per thread.

Got it. Can you proceed to make the recommendations for how we can break down the Assistants, Threads/Messages, and Models endpoints (and the related data structures).

Models:
- Need to recommend changes to current model.yaml and Models table
- Do we need to implement model presets? (I don't think so?)
Threads/Messages: roadmap: Cortex API supports /threads, /messages cortex.cpp#1567
Assistants: planning: Cortex API supports basic /assistants (Jan status quo equivalent) cortex.cpp#1573

I think it's better we bite the bullet and move to the correct data structures.

louis-jan · 2024-12-05T14:20:17Z

Update:

Scoped down for frontend:

Route requests to the backend. Since then, all migrations can be done afterward in one place (backend).

dan-menlo · 2024-12-16T02:18:38Z

Completed:

louis-jan · 2024-12-18T06:21:52Z

Attached: (Jan Frontend and QA)

Implementation: Jan uses cortex.cpp to persist /threads, /messages #4288

dan-menlo added this to Menlo Oct 29, 2024

dan-menlo assigned louis-jan Oct 29, 2024

dan-menlo converted this from a draft issue Oct 29, 2024

dan-menlo mentioned this issue Oct 29, 2024

roadmap: Cortex API supports /threads, /messages janhq/cortex.cpp#1567

Open

dan-menlo changed the title ~~planning: Jan migrates Conversational Extensions to Cortex~~ planning: Jan migrates Threads, Messages to Cortex, deprecates Conversation Extension Oct 29, 2024

dan-menlo mentioned this issue Oct 29, 2024

roadmap: Jan refactors /messages, /threads to Cortex #3895

Closed

6 tasks

dan-menlo changed the title ~~planning: Jan migrates Threads, Messages to Cortex, deprecates Conversation Extension~~ planning: Migrates Threads, Messages to Cortex, deprecates Conversation Extension Oct 30, 2024

dan-menlo changed the title ~~planning: Migrates Threads, Messages to Cortex, deprecates Conversation Extension~~ planning: Migrate Threads, Messages to Cortex, deprecates Conversation Extension Oct 30, 2024

dan-menlo assigned nguyenhoangthuan99 Oct 30, 2024

louis-jan assigned namchuai and unassigned nguyenhoangthuan99 Oct 31, 2024

imtuyethan moved this from Investigating to Planning in Menlo Nov 27, 2024

louis-jan moved this from Planning to Scheduled in Menlo Dec 5, 2024

louis-jan moved this from Scheduled to In Progress in Menlo Dec 5, 2024

imtuyethan added this to the v0.5.12 milestone Dec 10, 2024

dan-menlo closed this as completed Dec 16, 2024

github-project-automation bot moved this from In Progress to QA in Menlo Dec 16, 2024

imtuyethan modified the milestones: v0.5.12, v0.5.13 Dec 17, 2024

louis-jan mentioned this issue Dec 18, 2024

Implementation: Jan uses cortex.cpp to persist /threads, /messages #4288

Closed

imtuyethan modified the milestones: v0.5.13, v0.5.12 Dec 18, 2024

imtuyethan moved this from QA to Completed in Menlo Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

planning: Migrate Threads, Messages to Cortex, deprecates Conversation Extension #3904

planning: Migrate Threads, Messages to Cortex, deprecates Conversation Extension #3904

dan-menlo commented Oct 29, 2024 •

edited by louis-jan

Loading

louis-jan commented Oct 29, 2024 •

edited

Loading

Problems

So should we:

Decouple `threads` & `/models`

dan-menlo commented Oct 29, 2024

louis-jan commented Oct 30, 2024

dan-menlo commented Oct 30, 2024 •

edited

Loading

louis-jan commented Dec 5, 2024

dan-menlo commented Dec 16, 2024

louis-jan commented Dec 18, 2024

planning: Migrate Threads, Messages to Cortex, deprecates Conversation Extension #3904

planning: Migrate Threads, Messages to Cortex, deprecates Conversation Extension #3904

Comments

dan-menlo commented Oct 29, 2024 • edited by louis-jan Loading

Goal

Tasklist

louis-jan commented Oct 29, 2024 • edited Loading

Problems

So should we:

Decouple threads & /models

dan-menlo commented Oct 29, 2024

louis-jan commented Oct 30, 2024

dan-menlo commented Oct 30, 2024 • edited Loading

louis-jan commented Dec 5, 2024

dan-menlo commented Dec 16, 2024

louis-jan commented Dec 18, 2024

dan-menlo commented Oct 29, 2024 •

edited by louis-jan

Loading

louis-jan commented Oct 29, 2024 •

edited

Loading

Decouple `threads` & `/models`

dan-menlo commented Oct 30, 2024 •

edited

Loading