-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a launch argument for non_blocking=True #5268
Comments
Can you check if it's better now? |
I can confirm that ComfyUI is way faster for both loading and model patching for me now as of your commit from 40 minutes ago 6715899. Loading the base model has gone from 40s to about 5 seconds for me. Probably 3-8x speedup overall for patching and loading which are huge bottlenecks. |
By the way, have you ever thought about / looked into combining similar layer patches into one tensor before casting them and transferring to the device in this section of
They're all done individually since there are differently shaped layers and it's easy. But if they were combined so that all similar layers were sent over as one stacked tensor, the number of calls would be reduced from something like 429 to ~14 for Flux's unet, with much more data per call. I'm not sure how efficient sending lots of small tensors over is compared to grouping and sending it as a larger block, but it could be significant. |
Feature Idea
In
model_management.py
ComfyUI/comfy/model_management.py
Line 834 in 7390ff3
Changing this function back to its pre-TODO state results in a large speedup in model patching. (19s -> 6s for Flux LoRAs on my computer.) It probably also speeds up loading in other areas.
This is because the largest bottleneck is the one-by-one blocking transfer of each layer of the unet to the GPU, which is massively accelerated if non_blocking=True.
Are there still memory issues? Changes like this (39f114c) since this TODO was written could mean the same problems that used to cause memory issues may be less relevant than before or non-existent.
Please consider re-adding support for non_blocking=True as a launch argument so users can start trying it out again.
Existing Solutions
No response
Other
No response
The text was updated successfully, but these errors were encountered: