Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: can enable GPU acceleration with cuda not installed - model fails to start #3762

Closed
2 of 4 tasks
Tracked by #1165
johnhaire89 opened this issue Aug 13, 2024 · 6 comments
Closed
2 of 4 tasks
Tracked by #1165
Assignees
Labels
category: cortex.cpp Related to cortex.cpp category: providers Local & remote inference providers move to Cortex type: bug Something isn't working

Comments

@johnhaire89
Copy link

  • I have searched the existing issues

Current behavior

I was playing with Jan for the first time and realised that GPU acceleration wasn't enabled.
I toggled the "GPU Acceleration" switch to enable it for my NVIDIA RTA A2000 with no error.

When I next typed into the chat window, Jan wasn't able to start the model.

Problem was that I didn't have CUDA toolkit installed.
Per SO answer at https://stackoverflow.com/a/55717476, nvidia-smi shows the supported CUDA version, but nvcc --version should be used to check the installed version.
I installed CUDA Toolkit and it's back to working like magic.

This is probably more a feature request then a bug, but that toggle should probably show an error if I try to enable GPU acceleration for a nvidia card when CUDA toolkit isn't installed.

Minimum reproduction step

Start with a Windows PC with a NVidia gpu and CUDA Toolkit not installed (per nvcc --version)

  1. Under Settings > Advanced Settings, select the GPU and toggle the switch - toast says "Successfully turned on GPU acceleration"
  2. Try to start Mistral Instruct 7B Q4 - model fails to start

Expected behavior

When I try to enable GPU Acceleration for a Nvidia GPU in an environment where CUDA Toolkit isn't installed, I should get a helpful error.
Maybe a warning can be displayed next to GPU in the dropdown?

Screenshots / Logs

2024-08-13T02:27:29.268Z [CORTEX]::Debug: Spawn cortex at path: C:\Users\username\jan\extensions\@janhq\inference-cortex-extension\dist\bin\win-cuda-12-0\cortex-cpp.exe, and args: 1,127.0.0.1,3928
2024-08-13T02:27:29.268Z [CORTEX]::Debug: Spawning cortex subprocess...
2024-08-13T02:27:29.268Z [APP]::C:\Users\username\jan\extensions\@janhq\inference-cortex-extension\dist\bin\win-cuda-12-0
2024-08-13T02:27:29.380Z [CORTEX]::Debug: cortex is ready
2024-08-13T02:27:29.380Z [CORTEX]::Debug: Loading model with params {"cpu_threads":15,"ctx_len":2048,"prompt_template":"{system_message} [INST] {prompt} [/INST]","llama_model_path":"C:\\Users\\username\\jan\\models\\mistral-ins-7b-q4\\Mistral-7B-Instruct-v0.3-Q4_K_M.gguf","ngl":33,"system_prompt":"","user_prompt":" [INST] ","ai_prompt":" [/INST]","model":"mistral-ins-7b-q4"}
2024-08-13T02:27:29.391Z [CORTEX]::Debug: 20240813 02:27:29.291000 UTC 34396 INFO  cortex-cpp version: default_version - main.cc:73
20240813 02:27:29.292000 UTC 34396 INFO  cortex.llamacpp version: 0.1.20-30.06.24 - main.cc:78
20240813 02:27:29.292000 UTC 34396 INFO  Server started, listening at: 127.0.0.1:3928 - main.cc:81
20240813 02:27:29.292000 UTC 34396 INFO  Please load your model - main.cc:82
20240813 02:27:29.292000 UTC 34396 INFO  Number of thread is:20 - main.cc:89
20240813 02:27:29.383000 UTC 25336 INFO  CPU instruction set: fpu = 1| mmx = 1| sse = 1| sse2 = 1| sse3 = 1| ssse3 = 1| sse4_1 = 1| sse4_2 = 1| pclmulqdq = 1| avx = 1| avx2 = 1| avx512_f = 0| avx512_dq = 0| avx512_ifma = 0| avx512_pf = 0| avx512_er = 0| avx512_cd = 0| avx512_bw = 0| has_avx512_vl = 0| has_avx512_vbmi = 0| has_avx512_vbmi2 = 0| avx512_vnni = 0| avx512_bitalg = 0| avx512_vpopcntdq = 0| avx512_4vnniw = 0| avx512_4fmaps = 0| avx512_vp2intersect = 0| aes = 1| f16c = 1| - server.cc:277
20240813 02:27:29.392000 UTC 25336 ERROR Could not load engine: Could not load library "C:\Users\username\jan\extensions\@janhq\inference-cortex-extension\dist\bin\win-cuda-12-0/engines/cortex.llamacpp/engine.dll"
The specified module could not be found.

 - server.cc:290

2024-08-13T02:27:29.392Z [CORTEX]::Debug: Load model success with response {}
2024-08-13T02:27:29.398Z [CORTEX]::Debug: Validate model state failed with response "Conflict"
2024-08-13T02:27:29.398Z [CORTEX]::Error: Validate model status failed
2024-08-13T02:27:29.397Z [CORTEX]::Debug: Validate model state with response 409
2024-08-13T02:28:29.958Z [CORTEX]::Debug: Request to kill cortex
2024-08-13T02:28:29.958Z [CORTEX]::Debug: Killing PID 21376

Jan version

0.5.2

In which operating systems have you tested?

  • macOS
  • Windows
  • Linux

Environment details

Windows 11
NVIDIA RTX A2000 8GB Laptop GPU8192MB VRAM
CUDA toolkit not installed

@johnhaire89 johnhaire89 added the type: bug Something isn't working label Aug 13, 2024
@louis-jan
Copy link
Contributor

@Van-QA @imtuyethan I think that is something we implemented regarding error handling in the past? Which leads the user to the CUDA additional installation page.

@dan-menlo
Copy link
Contributor

@johnhaire89 FYI, Jan is in the process of overhauling how we deal with llama.cpp binaries and GPU dependencies.

@Van-QA I will keep this bug open. Once we clean up PM systems, let's link the 2 epics that would solve this bug. My style is to only close bugs once the corresponding feature is shipped.

  • Jan should embed llama.cpp through Cortex + cortex.llamacpp
  • cortex engines llama.cpp install should also pull CUDA dependences, cc @namchuai (FYI)

@freelerobot freelerobot transferred this issue from janhq/jan Sep 6, 2024
@dan-menlo
Copy link
Contributor

dan-menlo commented Sep 8, 2024

Handling this bug as part of janhq/cortex.cpp#1165

@louis-jan
Copy link
Contributor

louis-jan commented Sep 17, 2024

Hi @dan-homebrew @imtuyethan. This is a known issue, there is a fix in 0.5.4: #3552.

  1. Show a corresponding error message.
  2. Allow users to install dependencies.

We have this step to let user install additional dependencies right in the app (without redirecting users out of the app).
Image

In the next update of integrating cortex-cpp engine pull, there should be no extra request to install these dependencies, BUT there this error message would really help in case there is a Driver/Cuda update that does not work with the pulled engine & it's dependencies.

@github-project-automation github-project-automation bot moved this to Investigating in Menlo Oct 3, 2024
@gabrielle-ong gabrielle-ong transferred this issue from janhq/cortex.cpp Oct 3, 2024
@imtuyethan imtuyethan modified the milestone: v0.5.7 Oct 14, 2024
@imtuyethan
Copy link
Contributor

imtuyethan commented Oct 14, 2024

The fix is included in Jan's path to cortex.cpp: #3690

@imtuyethan
Copy link
Contributor

Deprecated issue, users are prompted to install CUDA toolkit in 0.5.7 when it's not available.

@github-project-automation github-project-automation bot moved this from Investigating to Review + QA in Menlo Nov 4, 2024
@imtuyethan imtuyethan moved this from Review + QA to Completed in Menlo Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: cortex.cpp Related to cortex.cpp category: providers Local & remote inference providers move to Cortex type: bug Something isn't working
Projects
Archived in project
Development

No branches or pull requests

7 participants