[QUESTION] How to load checkpoint saved in one parallel configuration (tensor/pipeline/data parallelism) can be loaded in a different parallel configuration ? #1242
Unanswered
polisettyvarma
asked this question in
Q&A
Replies: 2 comments 1 reply
-
You can checkout convert in tools folder, |
Beta Was this translation helpful? Give feedback.
1 reply
-
Can someone answer this ? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
How to load checkpoint saved in one parallel configuration (tensor/pipeline/data parallelism) can be loaded in a different parallel configuration ?
Based on this doc - https://github.com/NVIDIA/Megatron-LM/blob/main/docs/source/api-guide/dist_checkpointing.rst
There are some conflicting statements.
Can you provide a working example end to end to showcase this feature, Thanks.
Beta Was this translation helpful? Give feedback.
All reactions