Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misc. bug: SYCL builds >= 4040 have lower parallel performance by ~20% #10511

Open
0xDEADFED5 opened this issue Nov 26, 2024 · 0 comments
Open

Comments

@0xDEADFED5
Copy link

0xDEADFED5 commented Nov 26, 2024

Name and Version

ggml_sycl_init: GGML_SYCL_FORCE_MMQ: no
ggml_sycl_init: SYCL_USE_XMX: yes
ggml_sycl_init: found 1 SYCL devices:
version: 4040 (3bcd40b)
built with MSVC 19.41.34123.0 for

Operating systems

Windows

Which llama.cpp modules do you know to be affected?

llama-server

Problem description & steps to reproduce

tested with Intel A770 16GB

my testing method:
.\llama-server.exe -m qwen2.5-0.5b-instruct-q4_0.gguf -ngl 99 -t 11 --threads-http 48 -np 48 --temp 0
query = "Give me a random ten digit number, just the number"

throw 3000 of those queries at the llama server with an async http client
check how many answers i get in 30 seconds
run multiple times to verify

results:

these numbers are +- 50:
version: 3987 (61715d5) = Fastest result: 2040 requests.
version: 4005 (815fe72) = Fastest result: 1920 requests.
version: 4020 (9f40989) = Fastest result: 2015 requests.
version: 4032 (3407364) = Fastest result: 1968 requests.
version: 4038 (b11f9ba) = Fastest result: 2016 requests.
version: 4040 (3bcd40b) = Fastest result: 1582 requests.
version: 4080 (ae8de6d) = Fastest result: 1632 requests.

i've confirmed the slowness of 4040 with my actual real usage of -np 48 with a 3B model

i think the recommended SYCL build at https://github.com/ggerganov/llama.cpp/blob/master/docs/backend/SYCL.md should probably be b4038 for now

First Bad Commit

3bcd40b

Relevant log output

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant