Replies: 3 comments 1 reply
-
arguments:
throughput log:
|
Beta Was this translation helpful? Give feedback.
0 replies
-
Anyone can help? Thanks |
Beta Was this translation helpful? Give feedback.
1 reply
-
I am trying to run the 70B on 16 GPUs but I keep getting OOM errors? How did you manage to do it? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Your question
Machine: 2 nodes * 8 A100
TP=8
PP=2
DP=1
CP=1
seq_length=4096
micro_batch_size=1
global_batch_size=1
enable recompute activation, flash attention, distribute optimizer
Megatron version: core_v0.7.0
Thanks for you help!
Beta Was this translation helpful? Give feedback.
All reactions