Support sequence parallel on main branch #13

ufotalent · 2023-12-26T06:12:40Z

No description provided.

yiakwy-xpu-ml-framework-team · 2024-03-27T04:10:31Z

Lazy computation of partial gradients of weights with an aid of queue is really smart!. @ufotalent

However, I don't believe that you need to support sequence parallel, a.k.a it does not provide any useful features in reducing the total tokens processed in a single machine, only little improvements on batchnorm and dropout.

Context parallel is much more preferred.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support sequence parallel on main branch #13

Support sequence parallel on main branch #13

ufotalent commented Dec 26, 2023

yiakwy-xpu-ml-framework-team commented Mar 27, 2024 •

edited

Loading

Support sequence parallel on main branch #13

Support sequence parallel on main branch #13

Comments

ufotalent commented Dec 26, 2023

yiakwy-xpu-ml-framework-team commented Mar 27, 2024 • edited Loading

yiakwy-xpu-ml-framework-team commented Mar 27, 2024 •

edited

Loading