Misc. bug: SYCL builds >= 4040 have lower parallel performance by ~20% #10511

0xDEADFED5 · 2024-11-26T09:31:55Z

Name and Version

ggml_sycl_init: GGML_SYCL_FORCE_MMQ: no
ggml_sycl_init: SYCL_USE_XMX: yes
ggml_sycl_init: found 1 SYCL devices:
version: 4040 (3bcd40b)
built with MSVC 19.41.34123.0 for

Operating systems

Windows

Which llama.cpp modules do you know to be affected?

llama-server

Problem description & steps to reproduce

tested with Intel A770 16GB

my testing method:
.\llama-server.exe -m qwen2.5-0.5b-instruct-q4_0.gguf -ngl 99 -t 11 --threads-http 48 -np 48 --temp 0
query = "Give me a random ten digit number, just the number"

throw 3000 of those queries at the llama server with an async http client
check how many answers i get in 30 seconds
run multiple times to verify

results:

these numbers are +- 50:
version: 3987 (61715d5) = Fastest result: 2040 requests.
version: 4005 (815fe72) = Fastest result: 1920 requests.
version: 4020 (9f40989) = Fastest result: 2015 requests.
version: 4032 (3407364) = Fastest result: 1968 requests.
version: 4038 (b11f9ba) = Fastest result: 2016 requests.
version: 4040 (3bcd40b) = Fastest result: 1582 requests.
version: 4080 (ae8de6d) = Fastest result: 1632 requests.

i've confirmed the slowness of 4040 with my actual real usage of -np 48 with a 3B model

i think the recommended SYCL build at https://github.com/ggerganov/llama.cpp/blob/master/docs/backend/SYCL.md should probably be b4038 for now

First Bad Commit

3bcd40b

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

0xDEADFED5 added the bug-unconfirmed label Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misc. bug: SYCL builds >= 4040 have lower parallel performance by ~20% #10511

Misc. bug: SYCL builds >= 4040 have lower parallel performance by ~20% #10511

0xDEADFED5 commented Nov 26, 2024 •

edited

Loading

Misc. bug: SYCL builds >= 4040 have lower parallel performance by ~20% #10511

Misc. bug: SYCL builds >= 4040 have lower parallel performance by ~20% #10511

Comments

0xDEADFED5 commented Nov 26, 2024 • edited Loading

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Problem description & steps to reproduce

First Bad Commit

Relevant log output

0xDEADFED5 commented Nov 26, 2024 •

edited

Loading