Bug: SYCL builds >= b4069 have half the context limit of previous builds #10421

0xDEADFED5 · 2024-11-20T08:32:21Z

What happened?

found 1 SYCL devices:
|  |                   |                                       |       |Max    |        |Max  |Global |
    |
|  |                   |                                       |       |compute|Max work|sub  |mem    |
    |
|ID|        Device Type|                                   Name|Version|units  |group   |group|size   |       Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]|                Intel Arc A770 Graphics|    1.5|    512|    1024|   32| 16704M|            1.3.31093|
llama_kv_cache_init:      SYCL0 KV buffer size =  3937.50 MiB
llama_new_context_with_model: KV self size  = 3937.50 MiB, K (f16): 1968.75 MiB, V (f16): 1968.75 MiB
llama_new_context_with_model:  SYCL_Host  output buffer size =    27.82 MiB
ggml_backend_sycl_buffer_type_alloc_buffer: can't malloc 3278637056 Bytes memory on deviceggml_gallocr_reserve_n: failed to allocate SYCL0 buffer of size 7573604352
ggml_backend_sycl_buffer_type_alloc_buffer: can't malloc 3278637056 Bytes memory on deviceggml_gallocr_reserve_n: failed to allocate SYCL0 buffer of size 7573604352
llama_new_context_with_model: failed to allocate compute buffers
common_init_from_params: failed to create context with model 'C:\LLM\Qwen2.5-3B-Instruct_Q4_1.gguf'
srv    load_model: failed to load model, 'C:\LLM\Qwen2.5-3B-Instruct_Q4_1.gguf'
main: exiting due to model loading error

command line: llama-server.exe -t 16 --threads-http 8 --mlock -ngl 99 -m C:\LLM\Qwen2.5-3B-Instruct_Q4_1.gguf --port 8888 --ctx-size 112000 -np 48 --sampling-seq mt --min-p 0.1 --temp 1.5 -dt .1

command line works fine on builds < b4069

i have to lower context all the way to 60k with b4069

GPU: Intel Arc A770 (16GB)
OS: Windows

Name and Version

ZE_LOADER_DEBUG_TRACE:Using Loader Library Path:
ZE_LOADER_DEBUG_TRACE:Tracing Layer Library Path: ze_tracing_layer.dll
ggml_sycl_init: GGML_SYCL_FORCE_MMQ: no
ggml_sycl_init: SYCL_USE_XMX: yes
ggml_sycl_init: found 1 SYCL devices:
version: 4069 (2e82ffa)
built with MSVC 19.41.34123.0 for

What operating system are you seeing the problem on?

Windows

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

NeoZhangJianyu · 2024-11-20T13:46:41Z

The parameters lead to more memory requirement.
Please reduce --ctx-size and --batch-size values.

And make sure there is no other processes to use GPU memory.

qnixsynapse · 2024-11-21T12:33:48Z

@NeoZhangJianyu This error message, I think needs to be more user friendly than using very technical terms such as malloc. I am thinking of opening a PR.

edit: Also, have to change into GGML_LOG system.

gghfez · 2024-11-22T11:42:00Z

Same issue here on Linux. ended up reverting to b4068 for now since the latest version fails to build (most likely because of the oneapi upgrade requirement 2024 -> 2025).

0xDEADFED5 · 2024-11-25T03:07:18Z

The parameters lead to more memory requirement. Please reduce --ctx-size and --batch-size values.

And make sure there is no other processes to use GPU memory.

the batch size was a mistake, i don't use that parameter anymore. (though it was close to the default of 2048 for whatever strange reason)

i need that much context however, as i'm running highly parallel. I have many millions of requests so far at that context size, it works fine in builds < 4069.

i could live with slightly less context, but the new context limit is about 50% that of previous builds, so obviously something is wrong, no?

0xDEADFED5 added bug-unconfirmed critical severity Used to report critical severity bugs in llama.cpp (e.g. Crashing, Corrupted, Dataloss) labels Nov 20, 2024

0xDEADFED5 changed the title ~~Bug: SYCL builds >= b4069 fail to allocate SYCL0 buffer~~ Bug: SYCL builds >= b4069 have half the context limit of previous builds Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: SYCL builds >= b4069 have half the context limit of previous builds #10421

Bug: SYCL builds >= b4069 have half the context limit of previous builds #10421

0xDEADFED5 commented Nov 20, 2024 •

edited

Loading

NeoZhangJianyu commented Nov 20, 2024

qnixsynapse commented Nov 21, 2024 •

edited

Loading

gghfez commented Nov 22, 2024

0xDEADFED5 commented Nov 25, 2024 •

edited

Loading

Bug: SYCL builds >= b4069 have half the context limit of previous builds #10421

Bug: SYCL builds >= b4069 have half the context limit of previous builds #10421

Comments

0xDEADFED5 commented Nov 20, 2024 • edited Loading

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

NeoZhangJianyu commented Nov 20, 2024

qnixsynapse commented Nov 21, 2024 • edited Loading

gghfez commented Nov 22, 2024

0xDEADFED5 commented Nov 25, 2024 • edited Loading

0xDEADFED5 commented Nov 20, 2024 •

edited

Loading

qnixsynapse commented Nov 21, 2024 •

edited

Loading

0xDEADFED5 commented Nov 25, 2024 •

edited

Loading