Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

planning: Migrate Threads, Messages to Cortex, deprecates Conversation Extension #3904

Closed
1 of 2 tasks
Tracked by #3895
dan-menlo opened this issue Oct 29, 2024 · 7 comments
Closed
1 of 2 tasks
Tracked by #3895
Assignees
Milestone

Comments

@dan-menlo
Copy link
Contributor

dan-menlo commented Oct 29, 2024

Goal

  • Jan shifts Conversations state to Cortex for management
  • Deprecate Conversations Extension

Tasklist

@dan-menlo dan-menlo added this to Menlo Oct 29, 2024
@dan-menlo dan-menlo converted this from a draft issue Oct 29, 2024
@dan-menlo dan-menlo changed the title planning: Jan migrates Conversational Extensions to Cortex planning: Jan migrates Threads, Messages to Cortex, deprecates Conversation Extension Oct 29, 2024
@louis-jan
Copy link
Contributor

louis-jan commented Oct 29, 2024

According to this:

janhq/cortex.cpp#1567 (comment)

Problems

/messages is quite straightforward for now but Jan's /threads are a combination of model preset, assistant parameters, assistant tools and threads.
Also /assistants is not well designed, it defaults to a hard-coded template.

See a Jan thread.json example:

{
  "id": "jan_1729768043",
  "object": "thread",
  "title": "0.5.8 llama 3.2 1b",
  "assistants": [
    {
      "assistant_id": "jan",
      "assistant_name": "Jan",
      "tools": [
        {
          "type": "retrieval",
          "enabled": true,
          "settings": {
            "top_k": 2,
            "chunk_size": 1024,
            "chunk_overlap": 64,
            "retrieval_template": "Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n----------------\nCONTEXT: {CONTEXT}\n----------------\nQUESTION: {QUESTION}\n----------------\nHelpful Answer:"
          }
        }
      ],
      "model": {
        "id": "llama3.2-1b-instruct",
        "settings": {
          "engine": "llama-cpp",
          "ctx_len": 3072,
          "ngl": 100,
          "prompt_template": "<|start_header_id|>system<|end_header_id|>\n\n{system_message}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
          "text_model": false
        },
        "parameters": {
          "engine": "llama-cpp",
          "frequency_penalty": 0,
          "max_tokens": 3072,
          "presence_penalty": 0,
          "stop": [
            "<|eot_id|>"
          ],
          "stream": true,
          "temperature": 0.699999988079071,
          "top_p": 0.949999988079071
        },
        "engine": "llama-cpp"
      },
      "instructions": ""
    }
  ],
  "created": 1729768043312,
  "updated": 1730195853233,
  "metadata": {
    "lastMessage": "Hello!"
  }
}

See OpenAI Assistant and Thread:

{
  "id": "asst_abc123",
  "object": "assistant",
  "created_at": 1698984975,
  "name": "Math Tutor",
  "description": null,
  "model": "gpt-4o",
  "instructions": "You are a personal math tutor. When asked a question, write and run Python code to answer the question.",
  "tools": [
    {
      "type": "code_interpreter"
    }
  ],
  "metadata": {},
  "top_p": 1.0,
  "temperature": 1.0,
  "response_format": "auto"
}
{
  "id": "thread_abc123",
  "object": "thread",
  "created_at": 1699012949,
  "metadata": {},
  "tool_resources": {}
}

So should we:

  1. Introduce a new structure similar to an existing one and scoped by /threads and /messages
  2. Follow a popular schema such as OpenAI that could scale to /assistants

I think 2 is preferred since we could take advantage of existing test suites and client SDKs. Otherwise, we would eventually do another migration to scale to /assistants and double the workload, such as writing tests.

Decouple threads & /models

Currently, they are coupled and fairly similar to preset, which is not really well-defined. E.g. thread.json defines model settings, which created a side effect where switching between threads would also reload the model. It's an antipattern, and we should find a way to decouple it.

  1. Inference parameters & tools go to /assistants. It's to scale /assistants better where users can have more than one assistant persona (instructions + parameters) instead of hard coding.
  2. Model parameters go to /models where PUT takes effect (now it's used nowhere)
  3. The thread is now fairly thin. Better to scale to /run as well, it is a likely a container that glue components together (assistant, run, file_stores)
There would be many conclusions that affect Jan's UX such as:

Threads are now coupled with model settings, which introduces a bad UX where users get their model restarted every time they switch to a new thread, even with the same model.

  1. Moving model configurations to per-model settings would be beneficial. Those settings have a global affect.
  2. Assistants are clearly defined. Where users can have more than one assistant persona (instructions + parameters).

As a new user to this space, it's quite hard to get thread's parameters and settings. The Assistant Personas (instructions and parameters) and Model Capability Settings (more about hardware explanations) would help onboard users better.

@dan-menlo
Copy link
Contributor Author

As a new user to this space, it's quite hard to get thread's parameters and settings. The Writing Assistant Persona (instructions and parameters) and Model Capability Settings (more about hardware explanations) would help onboard users better.

Can you elaborate a bit more about:

  • Writing Assistant Persona: is this an Assistant?
  • Model Capability Settings: is this a FAQ? Or an assistant meant to teach the user how to use Jan Settings?

@dan-menlo dan-menlo changed the title planning: Jan migrates Threads, Messages to Cortex, deprecates Conversation Extension planning: Migrates Threads, Messages to Cortex, deprecates Conversation Extension Oct 30, 2024
@dan-menlo dan-menlo changed the title planning: Migrates Threads, Messages to Cortex, deprecates Conversation Extension planning: Migrate Threads, Messages to Cortex, deprecates Conversation Extension Oct 30, 2024
@louis-jan
Copy link
Contributor

ah @dan-homebrew I just mean

  1. Thread's Inference Parameters such as temperature, frequency penalty, presence penalty are quite incomprehensible. Move those to Assistant would make building an assistant persona easier to get.
  2. Modifying Thread's settings parameters, such as context window and ngl, cause a bad UX. Move to per-model settings might help. From there we add more hardware detection information such as the recommended GPU layers load and context length based on their device specs -> Global effect per model, not per thread.

@dan-menlo
Copy link
Contributor Author

dan-menlo commented Oct 30, 2024

ah @dan-homebrew I just mean

  1. Thread's Inference Parameters such as temperature, frequency penalty, presence penalty are quite incomprehensible. Move those to Assistant would make building an assistant persona easier to get.
  2. Modifying Thread's settings parameters, such as context window and ngl, cause a bad UX. Move to per-model settings might help. From there we add more hardware detection information such as the recommended GPU layers load and context length based on their device specs -> Global effect per model, not per thread.

Got it. Can you proceed to make the recommendations for how we can break down the Assistants, Threads/Messages, and Models endpoints (and the related data structures).

I think it's better we bite the bullet and move to the correct data structures.

@imtuyethan imtuyethan moved this from Investigating to Planning in Menlo Nov 27, 2024
@louis-jan louis-jan moved this from Planning to Scheduled in Menlo Dec 5, 2024
@louis-jan louis-jan moved this from Scheduled to In Progress in Menlo Dec 5, 2024
@louis-jan
Copy link
Contributor

Update:

Scoped down for frontend:

  • Route requests to the backend. Since then, all migrations can be done afterward in one place (backend).

@imtuyethan imtuyethan added this to the v0.5.12 milestone Dec 10, 2024
@louis-jan
Copy link
Contributor

@imtuyethan imtuyethan modified the milestones: v0.5.13, v0.5.12 Dec 18, 2024
@imtuyethan imtuyethan moved this from QA to Completed in Menlo Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

5 participants