Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Dynamic Typing #5293

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft

Conversation

guill
Copy link
Contributor

@guill guill commented Oct 20, 2024

RFC: Dynamic Typing

This Draft PR contains a proposal and initial implementation for adding official support for dynamic inputs/outputs to ComfyUI. This is intended to remove the UX barriers to adding "Loop", "Switch", and other nodes to the default ComfyUI.

ComfyUIDynamicTypingExample.mp4

Note: Getting the benefits of this change will also require the front-end changes located at Comfy-Org/ComfyUI_frontend#1271

The version of the execution-inversion-demo node pack (with loops and switches and the like) updated for this PR is located here: https://github.com/BadCafeCode/execution-inversion-demo-comfyui/tree/rfc/dynamic_typing

Functionality

The primary goal of this design is two-fold:

  1. Dynamic Typing - Enable the enforcement of interrelated type constraints when using the equivalent of "*" inputs and outputs.
  2. Variadic Inputs/Outputs - Officially support nodes with a variable number of inputs and outputs.

Why current solutions aren't sufficient

Use of "*" types

The most common solution to the lack of dynamic typing is to use "*" types. While this functions properly, the user experience is far from ideal. Once you're using a wildcard type, nothing is preventing you from connecting incompatible sockets. When you do make a mistake, the result is a Python error in some node (that may not even be the node where the issue occurred).

Custom Frontend Extensions - Dynamic Types

While I haven't seen it done, a custom frontend extension can technically enforce its own type constraints in the UI. While this would work with a single custom node pack in isolation, the propagation of node types through multiple dynamically typed nodes would cause issues. If we're going to start including nodes (like While Loops) in the base ComfyUI, we need a system that allows different node packs to play well with each other.

Custom Frontend Extensions - Variadic Inputs

Custom frontend extensions are frequently used (along with a kwargs argument) to allow for a dynamic number of inputs. The issue is that the backend knows nothing at all about these inputs. This means that any functionality that relies on input flags (like lazy evaluation) can't work with these inputs without terrifying hacks (like looking at the callstack to return different results from INPUT_TYPES depending on the caller).

Design Goals

There were a couple goals going into this:

  1. Make the common cases clean and easy to implement for node authors.
  2. Make the less common (and more complicated cases -- like End While loops needing types that match the linked Begin While node) possible to implement.
  3. Don't require the default frontend (or custom frontend extensions) for this functionality.
  4. Use a syntax that allows front-ends (particularly the default front-end) to do type resolution in the 99% case without a round trip to the back-end. (Note - this is not yet implemented.)
  5. Allow front-ends to gracefully fall back to letting the back-end perform type resolution in an efficient way (either because an alternative front-end hasn't implemented full type resolution or because there's a case the front-end can't handle).
  6. Don't break existing nodes. If people want to keep using "*" types, they don't need to change anything.

I know that Goal 5 is going to be the most controversial due to the extra call to the back-end, but I believe that it's necessary if we don't want to end up with the ComfyUI back-end being tied inextricably to the default front-end.

Architecture Overview

In order to accomplish the above goals, I've implemented this using a number of layers. The top layer is the easiest to use for custom node authors, but is also the least flexible. Custom nodes that require more complicated behavior can use the same API that the higher layers are built on top of.

Layer 1 - Template Type Syntax

Template type syntax can be activated by using the @TemplateTypeSupport decorator imported from comfy_execution.node_utils. The functionality it supports is:

  1. Dynamic input/output types (e.g. <T>)
  2. Wrapped input/output types (e.g. ACCUMULATION<T>)
  3. Dynamic number of inputs with the same type
  4. Dynamic number of inputs with different types

Dynamic Types

When specifying a type for an input or output, you can wrap an arbitrary string in angle brackets to indicate that it is dynamic. For example, the type <FOO> will be the equivalent of * (with the commonly used hacks) with the caveat that all inputs/outputs with the same template name (FOO in this case) must have the same type. Use multiple different template names if you want to allow types to differ. Note that this only applies within a single instance of a node -- different nodes can have different type resolutions

from comfy_execution.node_utils import TemplateTypeSupport

@TemplateTypeSupport()
class SimpleSwitch:
    @classmethod
    def INPUT_TYPES(cls):
        return {
            "required": {
                "switch": ("BOOLEAN",),
                "on_false": ("<T>", {}),
                "on_true": ("<T>", {}),
            },
        }

    RETURN_TYPES = ("<T>",)
    RETURN_NAMES = ("result",)
    FUNCTION = "switch"

    CATEGORY = "Examples"

    def switch(self, switch, on_false = None, on_true = None):
        value = on_true if switch else on_false
        return (value,)

Wrapped Types

Rather than using JUST a template type, you can also use a template type with a wrapping type. For example, if you have a node that takes two inputs with the types <FOO> and ACCUMULATION<FOO>, any output can be connected to the <FOO> input. Once that input has a value (let's say an IMAGE), the other input will resolve as well (to ACCUMULATION<IMAGE> in this example).

@TemplateTypeSupport()
class AccumulateNode:
    @classmethod
    def INPUT_TYPES(cls):
        return {
            "required": {
                "to_add": ("<T>", {}),
            },
            "optional": {
                "accumulation": ("ACCUMULATION<T>", {}),
            },
        }

    RETURN_TYPES = ("ACCUMULATION<T>",)
    RETURN_NAMES = ("accumulation",)
    FUNCTION = "accumulate"

    CATEGORY = "Examples"

    def accumulate(self, to_add, accumulation = None):
        if accumulation is None:
            value = [to_add]
        else:
            value = accumulation["accum"] + [to_add]
        return ({"accum": value},)

Dynamic Input Count (Same Type)

Sometimes, you want a node to take a dynamic number of inputs. To do this, create an input value that has a name followed by a number sign and a string (e.g. input#COUNT). This will cause additional inputs to be added and removed as the user attaches to those sockets. The string after the '#' can be used to ensure that you have the same number of sockets for two different inputs. For example, having inputs named image#FOO and mask#BAR will allow the number of images and the number of masks to dynamically increase independently. Having inputs named image#FOO and mask#FOO will ensure that there are the same number of images as masks.

The current dynamic count can be accessed from the node definition.

@TemplateTypeSupport()
class MakeListNode:
    @classmethod
    def INPUT_TYPES(cls):
        return {
            "required": {},
            "optional": {
                "value#COUNT": ("<T>", {}),
            },
            "hidden": {
                "node_def": "NODE_DEFINITION",
            },
        }

    RETURN_TYPES = ("<T>",)
    RETURN_NAMES = ("list",)
    FUNCTION = "make_list"
    OUTPUT_IS_LIST = (True,)

    CATEGORY = "Examples"

    def make_list(self, node_def, **kwargs):
        result = []
        for i in range(node_def.get("dynamic_counts", {}).get("COUNT", 0)):
            if "value%d" % i in kwargs:
                result.append(kwargs["value%d" % i])
        return (result,)

Dynamic Input Count (Different Types)

If you want to have a variadic input with a dynamic type, you can combine the syntax for the two. For example, if you have an input named "input#COUNT" with the type "<FOO#COUNT>", each socket for the input can have a different type. (Internally, this is equivalent to making the type <FOO1> where 1 is the index of this input.)

@TemplateTypeSupport()
class ConcatAsString:
    @classmethod
    def INPUT_TYPES(cls):
        return {
            "required": {},
            "optional": {
                "value#COUNT": ("<T#COUNT>", {}),
            },
            "hidden": {
                "node_def": "NODE_DEFINITION",
            },
        }

    RETURN_TYPES = ("STRING",)
    RETURN_NAMES = ("string",)
    FUNCTION = "concat"

    CATEGORY = "Examples"

    def concat(self, node_def, **kwargs):
        inputs = []
        for i in range(node_def.get("dynamic_counts", {}).get("COUNT", 0)):
            if "value%d" % i in kwargs:
                inputs.append(kwargs["value%d" % i])
        return ("\n".join(str(obj) for obj in objects_list))

Layer 2 - resolve_dynamic_types

Behind the scenes, Layer 1 (TemplateType syntax) is implemented using Layer 2. For the more complicated cases where TemplateType syntax is insufficient, custom nodes can use Layer 2 as well.

Layer 2 is used by defining a class function named resolve_dynamic_types on your node. This function can only make use of the following information when determining what inputs/outputs it should have:

  1. What the types are of outputs which are connected to this node's inputs (contained in the input_types argument)
  2. What the types are of inputs which are connected to this node's outputs (contained in the output_types argument)
  3. The input/output types of a node which is "entangled" via a direct connection on a socket defined as "entangleTypes": True.

The return value of resolve_dynamic_types should be a dictionary in the form:

return {
    "input": {
        # Same format as the return value of INPUT_TYPES
        "required": {}
    },
    "output": ("IMAGE", "MASK"),
    "output_name": ("My Image", "My Mask"),
}

Example

Here's an example of a 'switch' node.

from comfy_execution.node_utils import type_intersection
class SimpleSwitch:
    @classmethod
    def INPUT_TYPES(cls):
        return {
            "required": {
                "switch": ("BOOLEAN",),
                "on_false": ("*", {"forceInput": True}),
                "on_true": ("*", {"forceInput": True}),
            },
        }

    @classmethod
    def resolve_dynamic_types(cls, input_types, output_types, entangled_types):
        resolved_type = "*"
        if "on_false" in input_types:
            resolved_type = type_intersection(resolved_type, input_types["on_false"])
        if "on_true" in input_types:
            resolved_type = type_intersection(resolved_type, input_types["on_true"])
        if "result" in output_types:
            # Note that output_types contains a list of types since outputs can be connected
            # to multiple inputs.
            for output_type in output_types["result"]:
                resolved_type = type_intersection(resolved_type, output_type)

        return {
            "input": {
                # Same format as the return value of INPUT_TYPES
                "required": {
                    "switch": ("BOOLEAN",),
                    "on_false": (resolved_type, {"forceInput": True}),
                    "on_true": (resolved_type, {"forceInput": True}),
                },
            },
            "output": (resolved_type,),
            "output_name": ("result",),
        }

    RETURN_TYPES = ("*",)
    RETURN_NAMES = ("result",)
    FUNCTION = "switch"

    CATEGORY = "Examples"

    def switch(self, switch, on_false = None, on_true = None):
        value = on_true if switch else on_false
        return (value,)

Note - I don't currently try to handle "unstable" resolve_dynamic_types functions. While it would be relatively easy to cause unstable configurations to "fail", identifying the exact node responsible to give a useful error message would be a lot more difficult.

Layer 3 (Internal) - Node Definitions

Back-end

Internally to the ComfyUI back-end, I've turned the "node definition" (as returned from the /object_info endpoint) into a first-class object. Instead of directly calling INPUT_TYPES in multiple places, the execution engine makes use of a node definition that is calculated and cached at the beginning of execution (or as part of node expansion in the case of nodes that are created at runtime).

Theoretically, this could be extended in the future to making any other part of the node definition dynamic (e.g. whether it's an OUTPUT_NODE).

These node definitions are iteratively settled across the graph, with a maximum of O(sockets) iterations (though you'd have to try hard to actually approach that). The same function is used for both resolving types in response to /resolve_dynamic_types requests and prior to the beginning of execution, ensuring that the two are consistent.

Front-end

The frontend now hits the /resolve_dynamic_types endpoint each time edges are created or removed from the graph. This call is non-blocking, but type changes and the addition/removal of inputs/outputs won't occur until it completes. My hope is that by implementing something like the TemplateType syntax on the default front-end, we can make 99% of these calls no-ops.

Areas For Improvement

While my back-end changes are solid and could be code reviewed today, my front-end changes are hacky and would almost certainly need some attention from someone who has more experience with the front-end. While I'm posting this PR Draft now to start getting input, there are the following areas for improvement (mostly on the front-end):

  1. Dynamic inputs currently require "forceInput": True as I'm not currently creating/destroying widgets as appropriate. This also means that Primitives nodes won't connect to them.
  2. I added a displayOrder option for inputs. This is just intended to sort inputs on the front-end, but it doesn't seem to always work.
  3. Improved error handling when a custom node defines an unstable resolve_dynamic_types function. (Right now, it'll just infinitely loop.)
  4. Implementation of TemplateType syntax (or whatever syntax we land on) on the front-end to avoid the round trip time for most use-cases.

This is a proof of concept to get feedback. Note that it requires the
frontend branch of the same name.
@AustinMroz
Copy link

The design is delightfully thorough, but Goal 5 does leave me conflicted. Some current musings:

  • Making an effort to support alternative frontends and keep as much code as possible on the python side is good
  • The responsiveness of the request was worrisome, but testing has defied expectations. It performs well even when a workflow is actively executing.
  • The structuring of these requests seem to be minimal in use cases, but broad in its consequences.
    • All node connections trigger a backend request currently. This should probably be made opt in. If 99% of dynamic type cases can be resolved without need for a backend request and the vast majority of connections won't utilize dynamic types, then only a fraction of a fraction of connections should require a request.
    • It establishes precedence for frontend display calling into a custom node's python code. Limiting scope to just connections is good for this pull requests, but developers frequently seek help on dynamic widgets and are disappointed to learn the answer is js.
  • The frontend graph is not expected to be valid at all times and it's not unusual for a graph to be rejected as invalid when a workflow is submitted
    • For the rare fraction of a fraction case where things can't be handled by the string type name alone and resolving incorrect types is uncertain, is it better to wait until a workflow is submitted?
    • There is already support for highlighting a node and input as invalid when a submitted graph is rejected.

@mijuku233
Copy link

Can the backend obtain what type Dynamic Typing has become?
For example, I want to execute different codes by judging "what type Dynamic Typing has become"

@guill
Copy link
Contributor Author

guill commented Oct 22, 2024

@AustinMroz Yeah, for people working on local setups (or those who have low latency to the instance running in the cloud), the delay shouldn't be noticeable at all. Whatever delay they do see is likely thanks to a 500ms debounce I have that can probably be lowered significantly (or entirely removed once a workflow is fully loaded): https://github.com/Comfy-Org/ComfyUI_frontend/pull/1271/files#diff-4755cb4339a71140d692c520a82ab9afd9038b2b8223a0b2c19ebcb264addd42R94-R95

This should probably be made opt in. If 99% of dynamic type cases can be resolved without need for a backend request and the vast majority of connections won't utilize dynamic types, then only a fraction of a fraction of connections should require a request.

We could definitely do smarter things to reduce the number of calls. There is some complexity, however, in determining whether we should be making a call to the back-end. Let's say that Node X sometimes needs back-end evaluation. We don't just need to make a back-end request if an edge is added directly to (or removed from) Node X -- we also need to make a request if any of those edge types are modified through propagation. It's certainly possible and wouldn't even be particularly difficult -- but it would add complexity that I'm not sure is necessary.

Do you have any specific concerns around issues that could be caused by making those extra HTTP requests (assuming most of them will end up as no-ops)? Is it mostly around the CPU usage that could be incurred while also executing a graph?

The frontend graph is not expected to be valid at all times and it's not unusual for a graph to be rejected as invalid when a workflow is submitted

Resolving dynamic types doesn't need the graph to be valid; despite the identical format of the HTTP request, none of the existing execution/validation logic runs when hitting the resolve_dynamic_types endpoint. (If I'm totally missing the point you're making, could you explain it a bit more?)

For the rare fraction of a fraction case where things can't be handled by the string type name alone and resolving incorrect types is uncertain, is it better to wait until a workflow is submitted?

Just to clarify, it isn't that a back-end request is needed for 1% of all cases evenly spread -- it's that 1% of custom nodes would need a back-end request 100% of the time. The most immediate example is For/While loops that need to share types between the Begin and End versions of the node. While we may be able to add full support for entangled nodes to the TemplateType syntax in some way, I'm both hesitant to add that extra complexity to the commonly used syntax and concerned that there are likely other cases that we wouldn't be supporting.

The question of how complex it's worth making the declarative TemplateType syntax is definitely a subjective one though -- I might be leaning too far towards keeping it simple 🤔. Really appreciate the thoughtful feedback!

@guill
Copy link
Contributor Author

guill commented Oct 22, 2024

@mijuku233 Yeah, you can see in my Dynamic Input Count examples that I've added a new hidden input of type NODE_DEFINITION. This will contain the full definition of the node in the exact same format it appears when hitting the /object_info endpoint, but with all dynamic inputs/outputs fully resolved.

With that said, I would somewhat discourage making logic decisions in this way. If the dynamically typed node is connected to a node using old-style * inputs/outputs, the resolved type may just be *. Having an IMAGE behave differently depending on whether it's passed as an IMAGE or as a * would likely go against user expectations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants