NVIDIA / Megatron-LM Public

Notifications You must be signed in to change notification settings
Fork 2.5k
Star 11.2k

Code
Issues 178
Pull requests 156
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Pull requests: NVIDIA/Megatron-LM

Labels 11 Milestones 0

New pull request New

156 Open 258 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

add .sh file

#1373 opened Feb 4, 2025 by umsanmaru

Loading…

KV-cache for T5 model

#1358 opened Jan 17, 2025 by YK-Fu

Loading…

fix typo

#1352 opened Jan 10, 2025 by Jintao-Huang

Loading…

fix param overwrite problem in saver_mcore

#1351 opened Jan 9, 2025 by Force1ess

Loading…

Fix typo

#1347 opened Jan 4, 2025 by deep-sci

Loading…

Update theoretical memory footprint formula

#1345 opened Jan 3, 2025 by okoge-kaz

Loading…

Fix type annotation and checkpoint conversion script

#1344 opened Jan 3, 2025 by okoge-kaz

Loading…

fix bugs of data preprocessing with multiple json keys

#1337 opened Dec 25, 2024 by junjzhang

Loading…

Create python-package.yml

#1332 opened Dec 21, 2024 by invisiblepancake

Loading…

Fix: prevent double accumulation of load balancing loss and z-loss wi…

#1331 opened Dec 20, 2024 by thuwzt

Loading…

Add Mamba TRTLLM support

#1320 opened Dec 12, 2024 by meatybobby

Loading…

update network interface env

#1319 opened Dec 12, 2024 by lizamd

Loading…

fix args.mock_data bug caused by func get_blend_and_blend_per_split stale

No activity in 60 days on issue or PR

#1306 opened Nov 29, 2024 by 1195343015

Loading…

[Update] Print training log in rank0

#1296 opened Nov 21, 2024 by shijungg

Loading…

support qwen2 hf<->mcore ckpt converter

#1290 opened Nov 19, 2024 by wenyujin333

Loading…

Fix: Resolve multimodal model errors and update README usage instructions stale

No activity in 60 days on issue or PR

#1286 opened Nov 13, 2024 by singleheart

Loading…

Set torch.multiprocessing start method as 'spawn' stale

No activity in 60 days on issue or PR

#1285 opened Nov 12, 2024 by hxdtest

Loading…

Fix a bug in optimizer's mix_lr/max_lr when args.override_opt_param_scheduler==True

#1284 opened Nov 12, 2024 by lyuwen

Loading…

Huvu/update t5 attentionmasktype stale

No activity in 60 days on issue or PR

#1273 opened Nov 4, 2024 by huvunvidia

Loading…

Update t5_model.py stale

No activity in 60 days on issue or PR

#1271 opened Nov 2, 2024 by huvunvidia

Loading…

Enable huggingface tokenizer stale

No activity in 60 days on issue or PR

#1268 opened Oct 30, 2024 by msiddaiah

Loading…

fix: remove unnecessary trailing comma in statement stale

No activity in 60 days on issue or PR

#1265 opened Oct 29, 2024 by singleheart

Loading…

Enabling LR scaling for a specific layer (ex. down-projection...) during pretraining

#1262 opened Oct 28, 2024 by dhia680

Loading…

[ENHANCEMENT] Add support for Apex RMSNorm for use in qk-norm stale

No activity in 60 days on issue or PR

#1261 opened Oct 28, 2024 by wdevazelhes

Loading…

Add support to process gzip files stale

No activity in 60 days on issue or PR

#1260 opened Oct 28, 2024 by puneeshkhanna

Loading…

Previous 1 2 3 4 5 6 7 Next

Previous Next

ProTip! Updated in the last three days: updated:>2025-02-01.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly