You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am curious is anyone out here has successfully run a basic two tower retrieval model distributing using Horovod or any other method? I am seeing consistently poor results training in a distributed fashion and better with a single GPU (although not incredible). For the Horovod method I am sharding the interaction data across 8 GPUs. Has anyone seen good performance distributing this computation?
The text was updated successfully, but these errors were encountered:
I am curious is anyone out here has successfully run a basic two tower retrieval model distributing using Horovod or any other method? I am seeing consistently poor results training in a distributed fashion and better with a single GPU (although not incredible). For the Horovod method I am sharding the interaction data across 8 GPUs. Has anyone seen good performance distributing this computation?
The text was updated successfully, but these errors were encountered: