You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have noticed the enhancement request to use multi-GPU for parallel or concatenated generation like LLMs can easily do; but I offer a slightly different request still using multiple GPUs. The parallel/concatenation would be a nice feature to have if efficient, however in the meantime and possibly even in leu of distributed generation, it would be nice to use multiple GPUs in a round robin nature. This way you are distributing the load and in my case temp between the available GPUs. The same configuration options would exist for indicating the available GPUs, but add a config item that indicates you want the application to use a difference GPU with each invocation would be useful. In my case where I have three identical RTX-3060s it allow them to cool down between multiple invocations and distribute the load more evenly. That is instead of hammering on the same first GPU all the time. I don't think this conflicts with using multiple GPUs in parallel, it is sort of an alternative approach that might have more value for some. VRAM concatenation would be nice to have, don't get me wrong, but for many this new concept would be helpful as well.
Alternatives
Not much more to consider, the application just needs to select the next available GPU in the pool and use it. It would only need to keep track of the previously used id and then move on to the next in order. It could be called round-robin or something similar.
Additional Content
For example of configuration...
multi-gpu-strategy: round-robin | all
The text was updated successfully, but these errors were encountered:
Is there an existing issue for this?
Contact Details
[email protected]
What should this feature add?
I have noticed the enhancement request to use multi-GPU for parallel or concatenated generation like LLMs can easily do; but I offer a slightly different request still using multiple GPUs. The parallel/concatenation would be a nice feature to have if efficient, however in the meantime and possibly even in leu of distributed generation, it would be nice to use multiple GPUs in a round robin nature. This way you are distributing the load and in my case temp between the available GPUs. The same configuration options would exist for indicating the available GPUs, but add a config item that indicates you want the application to use a difference GPU with each invocation would be useful. In my case where I have three identical RTX-3060s it allow them to cool down between multiple invocations and distribute the load more evenly. That is instead of hammering on the same first GPU all the time. I don't think this conflicts with using multiple GPUs in parallel, it is sort of an alternative approach that might have more value for some. VRAM concatenation would be nice to have, don't get me wrong, but for many this new concept would be helpful as well.
Alternatives
Not much more to consider, the application just needs to select the next available GPU in the pool and use it. It would only need to keep track of the previously used id and then move on to the next in order. It could be called round-robin or something similar.
Additional Content
For example of configuration...
multi-gpu-strategy: round-robin | all
The text was updated successfully, but these errors were encountered: