Replies: 1 comment
-
Marking as stale. No activity in 60 days. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I am using single-node multi-GPU cluster (A100 * 6) and would like to use Megatron-LM to learn llama2 model on that.
On the computational environment, I am not able to use docker and the CUDA version is fixed at 11.6.
My question is if it is possible to install Megatron-LM on my environment.
After installing required packages (including apex), I did
and run a shell script basically just calling pretrain_gpt.py.
It fails with the following error message saying "no module named transformer_engine".
Then I found https://github.com/NVIDIA/TransformerEngine, but looks like we need to have CUDA>=11.8.
It would be very helpful if you give me advice to sort this out.
Beta Was this translation helpful? Give feedback.
All reactions