Enable use to the rebar feature to upload buffers to the device. #9251

mtavenrath · 2024-08-30T09:37:50Z

Instead of copying host -> host staging -> device one can use the rebar feature to directly copy host -> device skipping the latency and 2nd memcpy which tripples memory bw consumption.

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

mtavenrath · 2024-08-30T09:38:39Z

Benchmark with Mistral-Nemo-Instruct-2407.Q5_K.gguf

| model                          |       size |     params | backend    | ngl |          nkvo | mmap |          test |              t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | ---: | ------------: | ---------------: |
| llama 13B Q5_K - Medium        |   8.12 GiB |    12.25 B | Vulkan     | 100 |             1 |    0 |         pp512 |    431.11 ± 3.66 |
| llama 13B Q5_K - Medium        |   8.12 GiB |    12.25 B | Vulkan     | 100 |             1 |    0 |         pp512 |    438.10 ± 0.52 |
| llama 13B Q5_K - Medium        |   8.12 GiB |    12.25 B | Vulkan     | 100 |             1 |    0 |         pp512 |    434.48 ± 1.57 |
| llama 13B Q5_K - Medium        |   8.12 GiB |    12.25 B | Vulkan     | 100 |             1 |    0 |         tg128 |     30.55 ± 0.75 |
| llama 13B Q5_K - Medium        |   8.12 GiB |    12.25 B | Vulkan     | 100 |             1 |    0 |         tg128 |     31.73 ± 0.11 |
| llama 13B Q5_K - Medium        |   8.12 GiB |    12.25 B | Vulkan     | 100 |             1 |    0 |         tg128 |     31.62 ± 0.13 |

rebar

| model                          |       size |     params | backend    | ngl |          nkvo | mmap |          test |              t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | ---: | ------------: | ---------------: |
| llama 13B Q5_K - Medium        |   8.12 GiB |    12.25 B | Vulkan     | 100 |             1 |    0 |         pp512 |    435.98 ± 1.63 |
| llama 13B Q5_K - Medium        |   8.12 GiB |    12.25 B | Vulkan     | 100 |             1 |    0 |         pp512 |    433.19 ± 1.01 |
| llama 13B Q5_K - Medium        |   8.12 GiB |    12.25 B | Vulkan     | 100 |             1 |    0 |         pp512 |    435.82 ± 0.89 |
| llama 13B Q5_K - Medium        |   8.12 GiB |    12.25 B | Vulkan     | 100 |             1 |    0 |         tg128 |     35.07 ± 0.14 |
| llama 13B Q5_K - Medium        |   8.12 GiB |    12.25 B | Vulkan     | 100 |             1 |    0 |         tg128 |     34.89 ± 0.20 |
| llama 13B Q5_K - Medium        |   8.12 GiB |    12.25 B | Vulkan     | 100 |             1 |    0 |         tg128 |     34.36 ± 0.19 |

Enable use to the rebar feature to upload buffers to the device.

06e3e3b

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Aug 30, 2024

mofosyne added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label Aug 30, 2024

0cc4m assigned 0cc4m and unassigned 0cc4m Sep 8, 2024

0cc4m self-requested a review September 8, 2024 19:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable use to the rebar feature to upload buffers to the device. #9251

Enable use to the rebar feature to upload buffers to the device. #9251

mtavenrath commented Aug 30, 2024

mtavenrath commented Aug 30, 2024

Enable use to the rebar feature to upload buffers to the device. #9251

Are you sure you want to change the base?

Enable use to the rebar feature to upload buffers to the device. #9251

Conversation

mtavenrath commented Aug 30, 2024

mtavenrath commented Aug 30, 2024