Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: vulkan. #91

Closed
wants to merge 15 commits into from
Closed

feat: vulkan. #91

wants to merge 15 commits into from

Conversation

b4rtaz
Copy link
Owner

@b4rtaz b4rtaz commented Jun 16, 2024

The experimental Vulkan support for the matrix multiplication.


To try run Distributed Llama with the Vulkan support you need clone this branch. Also you need to have installed Vulkan dev environment to compile shaders and Distributed Llama with the -lvulkan lib.

  1. Build Distributed Llama:
make dllama DLLAMA_VULKAN=1
  1. Run Distributed Llama with the --accelerator=1/1 argument.
./dllama inference --accelerator 1/1 \
  --buffer-float-type q80 --prompt "Hello" --steps 128 --nthreads 1 --model models/llama3_8b_q40/dllama_model_llama3_8b_q40.m \
  --tokenizer models/llama3_8b_q40/dllama_tokenizer_llama3_8b_q40.t

The value for this argument defines "what percent of the computation should be moved to GPU". 1/1 means 100%. 1/2 means 50% etc.

The current implementation tries to run the inference on CPU and GPU simultaneously. You can still control how many threads should be used by setting the --nthreads 4 argument. So basically the goal it to find the best values for --nthreads argument and the --accelerator 1/3 argument, to achieve the best speed.

@b4rtaz b4rtaz mentioned this pull request Jun 16, 2024
@unclemusclez
Copy link

unclemusclez commented Jun 16, 2024

unclemusclez@ttv:~/distributed-llama/src/vulkan$ ls
matmul-f32-f32.comp  matmul-q40-f32.spv   matmul-q40-q80.spv
matmul-f32-f32.spv   matmul-q40-q80.comp
matmul-q40-f32.comp  matmul-q40-q80.sp
unclemusclez@ttv:~/distributed-llama/src/vulkan$ cd ../..
unclemusclez@ttv:~/distributed-llama$ make dllama DLLAMA_VULKAN=1
Makefile:55: warning: overriding recipe for target 'funcs-test'
Makefile:29: warning: ignoring old recipe for target 'funcs-test'
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/ut                                                                                                                                                             ils.cpp -o utils.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/qu                                                                                                                                                             ants.cpp -o quants.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/fu                                                                                                                                                             ncs.cpp -o funcs.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/co                                                                                                                                                             mmands.cpp -o commands.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/so                                                                                                                                                             cket.cpp -o socket.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/tr                                                                                                                                                             ansformer.cpp -o transformer.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/ta                                                                                                                                                             sks.cpp -o tasks.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/ll                                                                                                                                                             ama2-tasks.cpp -o llama2-tasks.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/gr                                                                                                                                                             ok1-tasks.cpp -o grok1-tasks.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/mi                                                                                                                                                             xtral-tasks.cpp -o mixtral-tasks.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/to                                                                                                                                                             kenizer.cpp -o tokenizer.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/ap                                                                                                                                                             p.cpp -o app.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/ac                                                                                                                                                             celerator-vulkan.cpp -o accelerator-vulkan.o
src/accelerator-vulkan.cpp: In constructor 'VulkanContext::VulkanContext()':
src/accelerator-vulkan.cpp:85:37: error: format '%llu' expects argument of type                                                                                                                                                              'long long unsigned int', but argument 3 has type 'vk::DeviceSize' {aka 'long un                                                                                                                                                             signed int'} [-Werror=format=]
   85 |             printf("🌋 heap[%u]: %llu MB\n", h, memoryProperties.memoryH                                                                                                                                                             eaps[h].size / (1024 * 1024));
      |                                  ~~~^
      |                                     |
      |                                     long long unsigned int
      |                                  %lu
cc1plus: all warnings being treated as errors
make: *** [Makefile:17: accelerator-vulkan.o] Error 1

@b4rtaz
Copy link
Owner Author

b4rtaz commented Jun 28, 2024

Unfortunately the approach applied in this PR is not a good direction. I have to revise it.

@b4rtaz b4rtaz closed this Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants