Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Modular Diffusers
This PR experiments with some initial designs for a modular pipeline building system that we plan to support in diffusers officially.
Key components
CustomPipelineBuilder
CustomPipelineBuilder
is the main user interface for creating and running custom pipelines. An example usagePipelineBlock
PipelineBlock
is the building block for the custom pipelines. Each user-defined block has to inherit fromPipelineBlock
, it should:__call__
method that performs a specific part of the pipeline's operation. It should always take and return the same two variables: pipeline (CustomPipeline
) and state (PipelineState
)Define a
PipelineBlock
You will need to implement new features using
PipelineBlock
. Here is an example on how how to write aPipelineBlock
that encodes the text promptsWe will also release pre-defined pipeline blocks so that you can build all the official pipelines that we maintain with the pipeline builder system. e.g, for SDXL, in this PR, we have a set of pipeline blocks such as
InputStep
,TextEncoderStep
,SetTimestepsStep
,PrepareLatentsStep
,PrepareAdditionalConditioningStep
,PrepareGuidance
,DenoiseStep
,DecodeLatentsStep
PipelineState
A new
PipelineState
is created when you runbuilder.run_pipeline()
, It will then be passed through each blocks, and the state of the pipeline throughout its execution, including inputs, outputs, and all the intermediate inputs/outputs.CustomPipeline
CustomPipeline
is the base class for all custom pipelines built usingCustomPipelineBuilder
. It is used only as a container for pipeline components, pipeline-level config/attributes, and common pipeline methods.Note that unlike
DiffusionPipeline
,CustomPipeline
does not handle loading and saving; it also does not have a__call__
method implemented and should only be run throughCustomPipellineBuilder.run_pipeline
method.At run time, the builder will pass the
CustomPipeline
object to each block so these pipeline-level methods and components can be used within these blocks.The main motivations to have this class are:
#Copied from
statement to copy all the methods to theSDXLCustomPipeline
without re-implement them for blocks.ModularDifussionPipeine
that can build itself - we can iterate on this design laterOverall Objectives
a user-friendly API to compose a pipeline
performance
community
Testing this PR
To build a pre-defined pipeline block
We will implement a
from_pretrained()
method onPipelineBlock
that allows you to load the pipeline block from a hub repo, similarly to how you would load a [DiffusionPipeline
]. For now, we need to load aDiffusionPipeline
first and reuse its components and configuration to initiate a pipeline blockOnce we have that there are two ways to create a PipelineBlock, we will use the
TextEncoderStep
as an examplemethod1: create it by passing all the
__init__
argumentsmethod2: use
from_pipe
API (recommended)You can print out the
PipelineBlock
object to get information about its components, configuration, as well as its inputs/outputsTo run a pipeline block
Our pipeline blocks are designed to be composed with other pipeline blocks, so we designed a
CustomPipelineBuilder
class that is responsible for composing the blocks together and running them in the correct order with the correct inputs. You can use the builder to run the pipeline block in the standalone manner as well.encode_prompt
exampleLet's first take a look at how to run the
encode_prompt
block on its own because it is pretty common to generate prompt embeddings as a separate stepFirst, create the builder and add the block
print(builder)
will give you information about what the builder has built so far, i.e., you can find out information about the pipeline blocks and each block's output, as well as their components; It also puts together a list of "call parameters" so you know which argument you need to pass to run that block.CustomPipeline Configuration: ============================== Pipeline Blocks: ---------------- 1. TextEncoderStep -> prompt_embeds, negative_prompt_embeds, pooled_prompt_embeds, negative_pooled_prompt_embeds Registered Components: ---------------------- text_encoder: CLIPTextModel text_encoder_2: CLIPTextModelWithProjection tokenizer: CLIPTokenizer tokenizer_2: CLIPTokenizer Default Call Parameters: ------------------------ prompt: None prompt_2: None negative_prompt: None negative_prompt_2: None cross_attention_kwargs: None prompt_embeds: None negative_prompt_embeds: None pooled_prompt_embeds: None negative_pooled_prompt_embeds: None num_images_per_prompt: 1 guidance_scale: 5.0 clip_skip: None Required Call Parameters: -------------------------- Note: These are the default values. Actual values may be different when running the pipeline.
let's define a
prompt
and userun_blocks
method to run the block!The
run_blocks
method always returns the entire pipeline state. You can get the specific tensor outputs usingstate.get_intermediates()
, e.g., we can usestate.get_intermediate("prompt_embeds")
to get prompt_embedsYou can get the default call parameters in a dict with
builder.default_call_parameters
, edit it, and pass it directly torun_blocks
.decode_latent
exampleLet's take another
decode_latent
example where we pass the generated latent to the block to get the imagebased on the printed-out builder info, we know the DecodeLatentsStep takes
latents
input and outputimages
, it comes with avae
components, and has two optional call argumentsoutput_type
andreturn_dict
(these are our standard pipeline call parameters); it also has a required argumentlatents
which we have to pass torun_blocks
to decode and get the image, you can run
Build a Modular Pipeline Incrementally
Using
CustomPipelineBuilder
, you can build a pipeline block, test it out using the process we describe in the last section (builder.add_block()
+builder.run_blocks()
); and then move on to building the next block, repeat the same process. Note that therun_blocks
method also takesstate
argument, so you can just take thestate
output from last step and pass it to the next step - we recommend to usePipelineState
to manage your inputs between the pipeline block runs.here is an example of how to build the SDXL text2img pipeline incrementally
example code
You can also built a "partial pipeline" with a subset of blocks and test as you go
a complete SDXL text2img pipeline looks like this after it is built
builder.run_pipeline()
if you already built all the pipeline blocks and want to use them to run inference, you can add all the blocks to the builder at once using
add_blocks()
and userun_pipeline
to get the generated imageMore Pipeline Examples: text2img, img2img, controlnet etc
text2Img
build + run a text-to-image pipeline
Img2Img
img2img has its blocks for
set_timesteps
,prepare_latents
andprepare_add_cond
:Image2ImageSetTimestepsStep
Image2ImagePrepareLatentsStep
Image2ImagePrepareAdditionalConditioningStep
To build an Img2img pipeline with our pipeline building system, this is the only code change needed
to make the pipeline:
to run the modular img2img pipeline
controlnet
To add controlnet to any existing pipeline, you need to replace the
DenoiseStep
withControlnetDenoiseStep
to add controlnet to text-to-image pipeline
to run the controlnet text-to-image pipeline
to build + run controlnet with img2img
To-Dos
the PR is in a very early stage, so there are a lot of to-dos left, I will just list a few that I'm working on next
from_pipe
forPipelineBlock
from_pretrained
forPipelineBlock
from_pipe
/from_pretrained
method forCustomPipelineBuilder
so that you can create a regular pipeline and pass it to the builder. The builder will map the pipeline to a set of pre-defined pipeline blocks and automatically create the pipeline. You can then use it as a starting point to build your custom pipeline, e.g.guide
class to see if we can make it robust and generalize different guidance methods, e.g. PAGPipelineState
- this feature should only apply torun_pipeline
(notrun_blocks
). Since the custom pipeline blocks may not be written in the most memory-efficient way, e.g., user could add too many intermediate variables that are not needed, we can add some guard rails to prevent that