Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU VRAM Usage during training #58

Open
KevinD777 opened this issue Dec 11, 2023 · 1 comment
Open

GPU VRAM Usage during training #58

KevinD777 opened this issue Dec 11, 2023 · 1 comment

Comments

@KevinD777
Copy link

KevinD777 commented Dec 11, 2023

Hi,

Thanks for your great work! I have some questions regarding the GPU usage when training with LLaMa 2:

  1. What is the peak usage of the VRAM when training the Unlimiformer using the long-range training methods in both 8k and 16k settings?
  2. Since the complexity is linear during training, training in 16k is around double VRAM than 8k if I understand correctly. So if I want to train Unilimiformer in 80k, it would be 10 times more VRAM usage than 8k?
  3. I saw in a previous issue that currently Unilimiformer could only be trained in a single GPU, so the training length will be limited on the max single GPU RAM, say 80GB for A100 GPU. So I am curious is the 16k training length is the max possible length for now?

Thanks!

@abertsch72
Copy link
Owner

Thanks for your interest!

  1. Looking back at some old run data, I'm seeing ~45Gb of GPU memory for BART-base with 16k max length (using retrieval training). I don't have numbers handy for the 8k case right now, but I'd guess somewhere a little less than halfway between there and the cost of finetuning BART without Unlimiformer.
  2. Roughly, yes-- there's some fixed cost for storing the model weights itself, but most of the memory required comes from the input+computational graphs. So it would be slightly less than 10x more expensive, but that's the right ballpark.
  3. This depends on the model size and your GPU size-- in the paper we were using BART-base and a 48-GB GPU, so we were limited to ~16k.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants