Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: OOM issue - [0.5.10] RAG feature failed with Engine Is Not Loaded Yet error #4207

Open
1 of 3 tasks
imtuyethan opened this issue Dec 3, 2024 · 1 comment
Open
1 of 3 tasks
Assignees
Labels
category: experimental features category: tools RAG, web search, files, function calling type: bug Something isn't working

Comments

@imtuyethan
Copy link
Contributor

imtuyethan commented Dec 3, 2024

Jan version

0.5.10

Describe the Bug

https://discord.com/channels/1107178041848909847/1313496019475894363

When using the Retrieval (RAG) feature with PDF files, the engine fails to load with error code 322122505 Engine Is Not Loaded Yet. This prevents PDF analysis functionality from working.

Steps to Reproduce

  1. Set up Embedding Model as Llama 3.2 1B Instruct Q8
  2. Set up Vector Database as HNSWLib
  3. Upload a PDF file (issue persists with files as small as 27.51KB)
  4. Attempt to use RAG/PDF analysis feature

Error Messages

  • "Engine Is Not Loaded Yet"
  • "llama_decode_internal: invalid token"
  • "llama_decode: failed to decode"
  • "Internal error catched Input prompt is too big compared to KV size"

Logs

  • Error code: 322122505 from cortex-server.exe process

Expected behavior

  • Engine should load successfully
  • PDF should be processed for RAG functionality

Screenshots / Logs

Logs
message (4).txt
app (2).log
cortex (2).log

Device specs

  • Machine: x86
  • OS: Windows (based on .exe reference)
  • Memory: 32.00 GB (24.84 GB used)
  • CPU Usage: 18%

What is your OS?

  • MacOS
  • Windows
  • Linux
@imtuyethan imtuyethan added the type: bug Something isn't working label Dec 3, 2024
@github-project-automation github-project-automation bot moved this to Investigating in Jan & Cortex Dec 3, 2024
@imtuyethan imtuyethan added category: tools RAG, web search, files, function calling category: experimental features labels Dec 3, 2024
@louis-jan
Copy link
Contributor

That is an OOM issue, right? @vansangpfiev @nguyenhoangthuan99
ERROR llama_decode_internal: invalid token[0] = 1074917552 - llama_engine.cc:493
ERROR llama_decode: failed to decode, ret = -1 - llama_engine.cc:493
Failed to decode the batch: KV cache is full - try increasing it via the context size: i = 0, n_batch = 2048, ret = -1 - llama_server_context.cc:1653
Internel error catched Input prompt is too big compared to KV size. Please try increasing KV size. - llama_server_context.cc:862
Internel error catched Input prompt is too big compared to KV size. Please try increasing KV size. - llama_server_context.cc:862
ERROR Internel error catched Input prompt is too big compared to KV size. Please try increasing KV size. - llama_server_context.cc:862
ERROR Internel error catched Input prompt is too big compared to KV size. Please try increasing KV size. - llama_server_context.cc:862

@louis-jan louis-jan changed the title bug: [0.5.10] RAG feature failed with Engine Is Not Loaded Yet error bug: OOM issue - [0.5.10] RAG feature failed with Engine Is Not Loaded Yet error Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: experimental features category: tools RAG, web search, files, function calling type: bug Something isn't working
Projects
Status: Investigating
Development

No branches or pull requests

2 participants