You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Lazy computation of partial gradients of weights with an aid of queue is really smart!. @ufotalent
However, I don't believe that you need to support sequence parallel, a.k.a it does not provide any useful features in reducing the total tokens processed in a single machine, only little improvements on batchnorm and dropout.
No description provided.
The text was updated successfully, but these errors were encountered: