You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your great work! I have some questions regarding the GPU usage when training with LLaMa 2:
What is the peak usage of the VRAM when training the Unlimiformer using the long-range training methods in both 8k and 16k settings?
Since the complexity is linear during training, training in 16k is around double VRAM than 8k if I understand correctly. So if I want to train Unilimiformer in 80k, it would be 10 times more VRAM usage than 8k?
I saw in a previous issue that currently Unilimiformer could only be trained in a single GPU, so the training length will be limited on the max single GPU RAM, say 80GB for A100 GPU. So I am curious is the 16k training length is the max possible length for now?
Thanks!
The text was updated successfully, but these errors were encountered:
Looking back at some old run data, I'm seeing ~45Gb of GPU memory for BART-base with 16k max length (using retrieval training). I don't have numbers handy for the 8k case right now, but I'd guess somewhere a little less than halfway between there and the cost of finetuning BART without Unlimiformer.
Roughly, yes-- there's some fixed cost for storing the model weights itself, but most of the memory required comes from the input+computational graphs. So it would be slightly less than 10x more expensive, but that's the right ballpark.
This depends on the model size and your GPU size-- in the paper we were using BART-base and a 48-GB GPU, so we were limited to ~16k.
Hi,
Thanks for your great work! I have some questions regarding the GPU usage when training with LLaMa 2:
Thanks!
The text was updated successfully, but these errors were encountered: