Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a miniversion containing only ZB-H1 and essential changes so other megatron forks can easily integrate #10

Open
ufotalent opened this issue Dec 15, 2023 · 5 comments

Comments

@ufotalent
Copy link

@ufotalent To implement a version using our own running engine and async IO
@QPHutu To implement a version by modifying 1f1b schedule using sync IO

@robotsp
Copy link

robotsp commented Mar 2, 2024

I ran a llama-7B instance in pipeline parallel mode (pp size=8, tp size=1) using ZB-H1, but I found there is no exact better performance vs. the original one. Both durations of each step are the same.

Does it make sense? @QPHutu @ufotalent @P2333

@ufotalent
Copy link
Author

I ran a llama-7B instance in pipeline parallel mode (pp size=8, tp size=1) using ZB-H1, but I found there is no exact better performance vs. the original one. Both durations of each step are the same.

Does it make sense? @QPHutu @ufotalent @P2333

Hi thanks for trying out ZB-H1. The result seems like problematic because in this situation ZBH1 should provide some acceleration. May we know some details on the setup? Like what’s the code repo and what’s briefly the changes to enable ZBH1? Also what’s the number of mini batches in pp?

@robotsp
Copy link

robotsp commented Mar 4, 2024

I ran the latest version of Megatron-LM and patch the quick implementation for zb-h1(commit-id: 95212f7). The global batch size and the mini batch size of the testing are 256 and 4 respectively. @ufotalent @QPHutu @P2333

@ufotalent
Copy link
Author

I ran the latest version of Megatron-LM and patch the quick implementation for zb-h1(commit-id: 95212f7). The global batch size and the mini batch size of the testing are 256 and 4 respectively. @ufotalent @QPHutu @P2333

@robotsp is it possible to share the training script?
One possible suspect is the calling path is calling megatron.core.models.gpt.GPTModel instead of megatron.model.GPTModel. Currently the patch is effective on megatron.model.GPTModel.
Another suspect is that you're using interleaved-1f1b (by setting flags num_layers_per_virtual_pipeline_stage). Our current simple patch is for 1F1B schedule, not interleaved 1F1B.
Since your patched code works but just producing identical run times, I feel like the patched code path is not executed.
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants