WIP: Add MultiGPU support #12

YellowRoseCx · 2023-05-16T19:43:34Z

Goal is to allow multiple GPUs to pool together their VRAM for use during evaluation and quantization. Goal is also to be able to set a maximum amount of memory to use per GPU and have the rest of the data offloaded to CPU RAM.

Motivation: 16gb VRAM GPU runs out of memory evaluating a 30b model on C4-new dataset. It also runs OOM during quantization

I believe a mixture of the current multi-gpu code(for benchmarks) and the CPU offloading scripts will be useful for achieving this

Begin patch

ddca545

YellowRoseCx changed the title ~~Add MultiGPU support and CPU offloading for Evaluation and Quantization~~ WIP: Add MultiGPU support and CPU offloading for Evaluation and Quantization May 18, 2023

YellowRoseCx added 6 commits May 23, 2023 15:58

Merge branch '0cc4m:latestmerge' into multigpu-gptq

1c92a99

playing around and testing

379bdb3

playing around and testing

146c7c7

Prevent GPU Out Of Memory

10eec3c

Update modelutils.py

a89c4e3

multi gpu work in progress

30512d5

YellowRoseCx changed the title ~~WIP: Add MultiGPU support and CPU offloading for Evaluation and Quantization~~ WIP: Add MultiGPU support May 25, 2023

YellowRoseCx added 4 commits May 25, 2023 04:57

uncomment 2 lines

d1fd82d

Update llama.py

5330d3c

Update llama.py

f97f830

Update llama.py

3e9d3b2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Add MultiGPU support #12

WIP: Add MultiGPU support #12

YellowRoseCx commented May 16, 2023

WIP: Add MultiGPU support #12

Are you sure you want to change the base?

WIP: Add MultiGPU support #12

Conversation

YellowRoseCx commented May 16, 2023