You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ggml_sycl_init: GGML_SYCL_FORCE_MMQ: no
ggml_sycl_init: SYCL_USE_XMX: yes
ggml_sycl_init: found 1 SYCL devices:
version: 4040 (3bcd40b)
built with MSVC 19.41.34123.0 for
Operating systems
Windows
Which llama.cpp modules do you know to be affected?
llama-server
Problem description & steps to reproduce
tested with Intel A770 16GB
my testing method:
.\llama-server.exe -m qwen2.5-0.5b-instruct-q4_0.gguf -ngl 99 -t 11 --threads-http 48 -np 48 --temp 0
query = "Give me a random ten digit number, just the number"
throw 3000 of those queries at the llama server with an async http client
check how many answers i get in 30 seconds
run multiple times to verify
Name and Version
ggml_sycl_init: GGML_SYCL_FORCE_MMQ: no
ggml_sycl_init: SYCL_USE_XMX: yes
ggml_sycl_init: found 1 SYCL devices:
version: 4040 (3bcd40b)
built with MSVC 19.41.34123.0 for
Operating systems
Windows
Which llama.cpp modules do you know to be affected?
llama-server
Problem description & steps to reproduce
tested with Intel A770 16GB
my testing method:
.\llama-server.exe -m qwen2.5-0.5b-instruct-q4_0.gguf -ngl 99 -t 11 --threads-http 48 -np 48 --temp 0
query = "Give me a random ten digit number, just the number"
throw 3000 of those queries at the llama server with an async http client
check how many answers i get in 30 seconds
run multiple times to verify
results:
these numbers are +- 50:
version: 3987 (61715d5) = Fastest result: 2040 requests.
version: 4005 (815fe72) = Fastest result: 1920 requests.
version: 4020 (9f40989) = Fastest result: 2015 requests.
version: 4032 (3407364) = Fastest result: 1968 requests.
version: 4038 (b11f9ba) = Fastest result: 2016 requests.
version: 4040 (3bcd40b) = Fastest result: 1582 requests.
version: 4080 (ae8de6d) = Fastest result: 1632 requests.
i've confirmed the slowness of 4040 with my actual real usage of -np 48 with a 3B model
i think the recommended SYCL build at https://github.com/ggerganov/llama.cpp/blob/master/docs/backend/SYCL.md should probably be b4038 for now
First Bad Commit
3bcd40b
Relevant log output
No response
The text was updated successfully, but these errors were encountered: