Skip to content

Pull requests: microsoft/Megatron-DeepSpeed

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

support split qkv linear and sp overlap comm
#415 opened Jul 5, 2024 by inkcherry Loading…
fix NAN loss of rope long context training
#399 opened Jun 5, 2024 by inkcherry Loading…
convert mds checkpoint to Hf Llama model
#394 opened May 31, 2024 by vksastry Loading…
ds-sequence-parallel(ulysses) for rope.
#392 opened May 30, 2024 by inkcherry Loading…
add HFTokenizer option for preprocess_data
#388 opened May 17, 2024 by Jianhong-Zhang Loading…
Add layer norm weight plus 1
#378 opened Apr 18, 2024 by Yejing-Lai Loading…
Support Llama2Tokenizer
#375 opened Apr 11, 2024 by jinyouzhi Loading…
collect grad_norm for non pipeline path
#370 opened Mar 21, 2024 by inkcherry Loading…
optimize the generation of attention mask
#331 opened Jan 13, 2024 by imh966 Loading…
Enable torch.compile
#322 opened Dec 28, 2023 by tohtana Draft
support transfer llama hf weight to megatron weight
#246 opened Sep 12, 2023 by uygnef Loading…
add vit training with TP/PP
#146 opened Jun 9, 2023 by etoilestar Loading…
attempt at pipelining
#78 opened Aug 18, 2022 by siddharth9820 Loading…
Add support for DS comms
#50 opened Jun 13, 2022 by Quentin-Anthony Loading…
ProTip! Follow long discussions with comments:>50.