You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using distributed operation, I have four Gpus, each of which has a client. During the training process, each GPU has a huge difference. Two gpus even ran out of memory. By the way, I also found that gpu training with overflow was extremely slow and seemed to have gpu utilization close to zero.
The text was updated successfully, but these errors were encountered:
When using distributed operation, I have four Gpus, each of which has a client. During the training process, each GPU has a huge difference. Two gpus even ran out of memory. By the way, I also found that gpu training with overflow was extremely slow and seemed to have gpu utilization close to zero.
The text was updated successfully, but these errors were encountered: