Windows 11 ggml_gallocr_reserve_n: failed to allocate Vulkan0 #413

DarkSorrow · 2025-01-07T22:46:21Z

DarkSorrow
Jan 7, 2025

Hello,

So i'm trying to upload some model and i tried a few on with windows.
I had an 8b one that kept showing me the error llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
But it was loading and giving answers however it was never ending and it kept writing You are a helpful AI assistant. <</SYS>>[/INST] Bye! <</SYS>>[/INST] Bye!
I tried to load several other that i took from hugging face after (even some 1b) and i got that error all the time

Creating contextggml_vulkan: Device memory allocation of size 7981961216 failed.
ggml_vulkan: Requested buffer size exceeds device memory allocation limit: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 7981961216

Since the 8b was like 8gb and this one 800mb i know its not that i'm out of memory

I ran those command to check

PS D:\github\chatbot> npx --no node-llama-cpp inspect gpu
OS: Windows 10.0.26100 (x64)
Node: 20.16.0 (x64)
node-llama-cpp: 3.3.2

Vulkan: available

Vulkan device: AMD Radeon RX 6700 XT
Vulkan used VRAM: 0% (0B/11.98GB)
Vulkan free VRAM: 100% (11.98GB/11.98GB)
Vulkan unified memory: 11.98GB (100%)

CPU model: 13th Gen Intel(R) Core(TM) i5-13400F
Math cores: 10
Used RAM: 47.77% (15.2GB/31.82GB)
Free RAM: 52.22% (16.62GB/31.82GB)
Used swap: 80.59% (34GB/42.19GB)
Max swap size: 42.19GB

PS D:\github\chatbot> npx --no node-llama-cpp inspect estimate models/Meta-Llama-3.1-8B-Instruct-Q8_0.gguf
File: D:\github\chatbot\models\Meta-Llama-3.1-8B-Instruct-Q8_0.gguf
GPU info               Type: Vulkan   VRAM: 11.98GB   Name: AMD Radeon RX 6700 XT
Model info             Type: llama 8B MOSTLY_Q8_0   Size: 7.95GB   Train context size: 131K

Resolved config        100% compatibility   Context size: 16K   GPU layers: 33/33 (100%)   VRAM usage: 11.27GB
                       RAM usage: 0B
With flash attention   100% compatibility   Context size: 24K   GPU layers: 33/33 (100%)   VRAM usage: 11.27GB
                       RAM usage: 0B   Flash attention: enabled

File: D:\github\chatbot\models\Llama-3.2-1B-Instruct.Q4_K_M.gguf
GPU: Vulkan (last build)

  | Type    | Layers | Context size | Estimated model VRAM | Model VRAM | Diff                | Estimated context VRAM |
 Context VRAM | Diff                | VRAM usage
* | Model   | 17     |              | 762.81MB             | 0B         | 762.81MB  (Infin... |                        |              |                     | 00.00% (0B/11.98GB)
  | Error   | 17     | Error: Failed to create context     |            |                     |                        |              |                     |
  | Error   | 17     | 131072       | Error: Failed to create context   |                     |                        |              |                     |
  | Error   | 17     | 124722       | Error: Failed to create context   |                     |                        |              |                     |
  | Error   | 17     | 82397        | Error: Failed to create context   |                     |                        |              |                     |
  | Error   | 17     | 50653        | Error: Failed to create context   |                     |                        |              |                     |
  | Context | 17     | 25258        | 762.81MB             | 0B         | 762.81MB  (Infin... | 2.61GB                 |
 0B           | 2.61GB    (Infin... | 00.00% (0B/11.98GB)
  | Context | 17     | 4096         | 762.81MB             | 0B         | 762.81MB  (Infin... | 646.67MB               |
 0B           | 646.67MB  (Infin... | 00.00% (0B/11.98GB)
  | Context | 17     | 2048         | 762.81MB             | 0B         | 762.81MB  (Infin... | 450.64MB               |
 0B           | 450.64MB  (Infin... | 00.00% (0B/11.98GB)
  | Context | 17     | 1024         | 762.81MB             | 0B         | 762.81MB  (Infin... | 352.63MB               |
 0B           | 352.63MB  (Infin... | 00.00% (0B/11.98GB)

In the end i created a docker based on node22-slim and loaded that model and it seems to be working well tho a bit slow. I guess its something wrong with my llama.cpp but i'm not sure if there are ways to rebuild it. I'm open to suggestions right now

DarkSorrow · 2025-01-07T23:02:33Z

DarkSorrow
Jan 7, 2025
Author

ok doing this

const llama = await getLlama({
  gpu: false,
});

Helps me work without using my gpu. But i guess there is still a problem with the gpu loader with my card (all is up to date the ati driver and vulkan is installed too)

0 replies

giladgd · 2025-01-07T23:16:39Z

giladgd
Jan 7, 2025
Maintainer

Can you please provide me with a link to the model that didn't stop generation output? It'll help me investigate it.

Regarding the Vulkan error you saw, it doesn't necessarily mean that the context loading failed - there's a maximum memory allocation size in Vulkan, and the only way to find it currently is to test it, so this is what node-llama-cpp does; it attempts to create a context with a given size, and if it fails it then tries again with a smaller size and so on, so if you'll wait some more you'll see that it works in the end.

2 replies

DarkSorrow Jan 8, 2025
Author

Ok it seems to work as you said i just need to wait a while.
It was a model i generated from the 8b earlier and it probably failed, i deleted it but i'll try to generate it again later if you still want to. Tho i think it was more because of my docker generation that a problem with the library. The encoding was bad as the error said

DarkSorrow Jan 8, 2025
Author

Well i think it was a problem with the previous generation of my model. I tried a new model and i didn't get the error

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Windows 11 ggml_gallocr_reserve_n: failed to allocate Vulkan0 #413

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Windows 11 ggml_gallocr_reserve_n: failed to allocate Vulkan0 #413

Uh oh!

DarkSorrow Jan 7, 2025

Replies: 2 comments · 2 replies

Uh oh!

DarkSorrow Jan 7, 2025 Author

Uh oh!

giladgd Jan 7, 2025 Maintainer

Uh oh!

DarkSorrow Jan 8, 2025 Author

Uh oh!

DarkSorrow Jan 8, 2025 Author

DarkSorrow
Jan 7, 2025

Replies: 2 comments 2 replies

DarkSorrow
Jan 7, 2025
Author

giladgd
Jan 7, 2025
Maintainer

DarkSorrow Jan 8, 2025
Author

DarkSorrow Jan 8, 2025
Author