-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QUESTION] Whether to split bw when send_backward_recv_forward is not enabled #17
Comments
Hi @AndSonder Thanks for your interest. I'm not sure whether I'm understanding your question correctly. For zbh1 rank 0, the schedule pattern is that W is always after B, and there're no communication after a B, so a B-W split won't make bubble smaller here. |
@ufotalent Thanks for your replay. If there have communication after a B (just like the picture in your paper), is the bubble going to be smaller? |
@AndSonder Hi, on other ranks, theoretically a [B, send_B, W] schedule (with split) will be better than [B, W, send_B] (without split). However, something to notice here is that the send_B implementation in Megatron-LM is synchonized and may probably wait until its recv peer completes. This mens the send_B can delay W a lot if splitted. If we really want to do split, we should use async send here. |
OK ~ I know it. Very thanks for your answer. |
Hi, very appreciate your work. I have a question for zbh1 mode.
This is one part of your code:
You said that there is no need to split bw for BWF pattern.
My question is if we do not enable send_backward_recv_forward, is it better to split bw? A finer grain makes a smaller bubble, doesn't it?
The text was updated successfully, but these errors were encountered: