feat: vulkan. #91

b4rtaz · 2024-06-16T09:42:26Z

The experimental Vulkan support for the matrix multiplication.

To try run Distributed Llama with the Vulkan support you need clone this branch. Also you need to have installed Vulkan dev environment to compile shaders and Distributed Llama with the -lvulkan lib.

Build Distributed Llama:

make dllama DLLAMA_VULKAN=1

Run Distributed Llama with the --accelerator=1/1 argument.

./dllama inference --accelerator 1/1 \
  --buffer-float-type q80 --prompt "Hello" --steps 128 --nthreads 1 --model models/llama3_8b_q40/dllama_model_llama3_8b_q40.m \
  --tokenizer models/llama3_8b_q40/dllama_tokenizer_llama3_8b_q40.t

The value for this argument defines "what percent of the computation should be moved to GPU". 1/1 means 100%. 1/2 means 50% etc.

The current implementation tries to run the inference on CPU and GPU simultaneously. You can still control how many threads should be used by setting the --nthreads 4 argument. So basically the goal it to find the best values for --nthreads argument and the --accelerator 1/3 argument, to achieve the best speed.

unclemusclez · 2024-06-16T22:55:32Z

unclemusclez@ttv:~/distributed-llama/src/vulkan$ ls
matmul-f32-f32.comp  matmul-q40-f32.spv   matmul-q40-q80.spv
matmul-f32-f32.spv   matmul-q40-q80.comp
matmul-q40-f32.comp  matmul-q40-q80.sp
unclemusclez@ttv:~/distributed-llama/src/vulkan$ cd ../..
unclemusclez@ttv:~/distributed-llama$ make dllama DLLAMA_VULKAN=1
Makefile:55: warning: overriding recipe for target 'funcs-test'
Makefile:29: warning: ignoring old recipe for target 'funcs-test'
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/ut                                                                                                                                                             ils.cpp -o utils.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/qu                                                                                                                                                             ants.cpp -o quants.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/fu                                                                                                                                                             ncs.cpp -o funcs.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/co                                                                                                                                                             mmands.cpp -o commands.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/so                                                                                                                                                             cket.cpp -o socket.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/tr                                                                                                                                                             ansformer.cpp -o transformer.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/ta                                                                                                                                                             sks.cpp -o tasks.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/ll                                                                                                                                                             ama2-tasks.cpp -o llama2-tasks.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/gr                                                                                                                                                             ok1-tasks.cpp -o grok1-tasks.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/mi                                                                                                                                                             xtral-tasks.cpp -o mixtral-tasks.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/to                                                                                                                                                             kenizer.cpp -o tokenizer.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/ap                                                                                                                                                             p.cpp -o app.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/ac                                                                                                                                                             celerator-vulkan.cpp -o accelerator-vulkan.o
src/accelerator-vulkan.cpp: In constructor 'VulkanContext::VulkanContext()':
src/accelerator-vulkan.cpp:85:37: error: format '%llu' expects argument of type                                                                                                                                                              'long long unsigned int', but argument 3 has type 'vk::DeviceSize' {aka 'long un                                                                                                                                                             signed int'} [-Werror=format=]
   85 |             printf("🌋 heap[%u]: %llu MB\n", h, memoryProperties.memoryH                                                                                                                                                             eaps[h].size / (1024 * 1024));
      |                                  ~~~^
      |                                     |
      |                                     long long unsigned int
      |                                  %lu
cc1plus: all warnings being treated as errors
make: *** [Makefile:17: accelerator-vulkan.o] Error 1

Makefile

b4rtaz · 2024-06-28T22:13:00Z

Unfortunately the approach applied in this PR is not a good direction. I have to revise it.

b4rtaz added 7 commits June 13, 2024 22:39

first steps.

f98cc0a

vulkan matmul q40xf32, q40xq80, f32x32.

b2a20f3

rearrange tasks.

449c1b4

flexible ranges.

1000cfa

improve shaders.

3f47000

refactor.

48d4b79

accelerator manager.

b3fca58

b4rtaz mentioned this pull request Jun 16, 2024

Vulkan Acceleration #50

Open

b4rtaz added 3 commits June 17, 2024 00:19

improve log.

4913843

optimization.

3c54be2

[[unroll]]

e2c254a

fix: llu.

ffef6b3

DifferentialityDevelopment reviewed Jun 17, 2024

View reviewed changes

Makefile Show resolved Hide resolved

b4rtaz added 4 commits June 17, 2024 11:04

fix: windows.

4f31dd4

keep memory mapped.

cef50b7

add glslc to makefile.

0842dec

a tiny change.

ce513bd

b4rtaz closed this Jun 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: vulkan. #91

feat: vulkan. #91

b4rtaz commented Jun 16, 2024 •

edited

Loading

unclemusclez commented Jun 16, 2024 •

edited

Loading

b4rtaz commented Jun 28, 2024

feat: vulkan. #91

feat: vulkan. #91

Conversation

b4rtaz commented Jun 16, 2024 • edited Loading

unclemusclez commented Jun 16, 2024 • edited Loading

b4rtaz commented Jun 28, 2024

b4rtaz commented Jun 16, 2024 •

edited

Loading

unclemusclez commented Jun 16, 2024 •

edited

Loading