Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a launch argument for non_blocking=True #5268

Open
mturnshek opened this issue Oct 17, 2024 · 3 comments
Open

Add a launch argument for non_blocking=True #5268

mturnshek opened this issue Oct 17, 2024 · 3 comments
Labels
Feature A new feature to add to ComfyUI.

Comments

@mturnshek
Copy link
Contributor

mturnshek commented Oct 17, 2024

Feature Idea

In model_management.py

def device_should_use_non_blocking(device):
    if not device_supports_non_blocking(device):
        return False
    return False
    # return True #TODO: figure out why this causes memory issues on Nvidia and possibly others

# return True #TODO: figure out why this causes memory issues on Nvidia and possibly others

Changing this function back to its pre-TODO state results in a large speedup in model patching. (19s -> 6s for Flux LoRAs on my computer.) It probably also speeds up loading in other areas.

This is because the largest bottleneck is the one-by-one blocking transfer of each layer of the unet to the GPU, which is massively accelerated if non_blocking=True.

Are there still memory issues? Changes like this (39f114c) since this TODO was written could mean the same problems that used to cause memory issues may be less relevant than before or non-existent.

Please consider re-adding support for non_blocking=True as a launch argument so users can start trying it out again.

Existing Solutions

No response

Other

No response

@mturnshek mturnshek added the Feature A new feature to add to ComfyUI. label Oct 17, 2024
@mturnshek mturnshek changed the title Add a launch argument for nonblocking=True Add a launch argument for non_blocking=True Oct 17, 2024
@comfyanonymous
Copy link
Owner

Can you check if it's better now?

@mturnshek
Copy link
Contributor Author

I can confirm that ComfyUI is way faster for both loading and model patching for me now as of your commit from 40 minutes ago 6715899.

Loading the base model has gone from 40s to about 5 seconds for me. Probably 3-8x speedup overall for patching and loading which are huge bottlenecks.

@mturnshek
Copy link
Contributor Author

mturnshek commented Oct 17, 2024

By the way, have you ever thought about / looked into combining similar layer patches into one tensor before casting them and transferring to the device in this section of model_patcher.py?

        load_completely.sort(reverse=True)
        for x in load_completely:
            n = x[1]
            m = x[2]
            weight_key = "{}.weight".format(n)
            bias_key = "{}.bias".format(n)
            if hasattr(m, "comfy_patched_weights"):
                if m.comfy_patched_weights == True:
                    continue

            self.patch_weight_to_device(weight_key, device_to=device_to)
            self.patch_weight_to_device(bias_key, device_to=device_to)
            logging.debug("lowvram: loaded module regularly {} {}".format(n, m))
            m.comfy_patched_weights = True

They're all done individually since there are differently shaped layers and it's easy. But if they were combined so that all similar layers were sent over as one stacked tensor, the number of calls would be reduced from something like 429 to ~14 for Flux's unet, with much more data per call.

I'm not sure how efficient sending lots of small tensors over is compared to grouping and sending it as a larger block, but it could be significant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature A new feature to add to ComfyUI.
Projects
None yet
Development

No branches or pull requests

2 participants