You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I successfully ran the inferences with Llama-2-7b and unlimiformer but ran into memory errors when jumped to larger models. What are the minimum GPU memory requirements for running 13b and 70b models? Thank you!
The text was updated successfully, but these errors were encountered:
The base memory needed for that model (as you'd expect!). I haven't personally tried the 70b model, but this NVIDIA guide gives numbers that look pretty reasonable to me:
The file size of the model varies on how large the model is.
Llama2-7B-Chat requires about 30GB of storage.
Llama2-13B-Chat requires about 50GB of storage.
Llama2-70B-Chat requires about 150GB of storage.
The number of layers you apply Unlimiformer at. The good news here is that the additional cost from Unlimiformer doesn't depend on the model size (since we're only saving hidden states, and the models all have the same hidden dimension). You can calculate this for your input/use case by looking at the difference in GPU memory used between your 7b Llama+Unlimiformer setup and the base 7b model.
As a general recipe, I'd guess (amount of memory for the model) + (2-3 GBs per layer you'd like to apply Unlimiformer at) will get you pretty close to the amount needed, but this depends on how long your inputs are and whether you choose flat or trained indices.
Hi, I successfully ran the inferences with Llama-2-7b and unlimiformer but ran into memory errors when jumped to larger models. What are the minimum GPU memory requirements for running 13b and 70b models? Thank you!
The text was updated successfully, but these errors were encountered: