Add Stable Diffusion 3 Example #2558

Czxck001 · 2024-10-13T00:10:54Z

It appears the focus of the community has been largely shifted to Flux.dev1. So the main purpose of this PR is to demonstrate the capability of Candle and serves a smoke-test to the MMDiT (#2397).

As such, I intend to minimize the intrusive change to the existing stable-diffusion codebase, such as using renaming function to adapt the VAE var-builder to the official safetensor weights of SD3 VAE. Still, there are some changes I have to make to candle_nn::stable_diffusion to support the CLIP and VAE of SD3, including:

Add a forward_until_encoder_layer to ClipTextTransformer. The Comfy implementation for SD3 uses the penultimate hidden layer of CLIP-l and CLIP-g instead of the final layer (see sd3_clip.py and sdxl_clip.py). This practice, although not mentioned in the SD3 tech report, is referred and specified in Chapter 2.1 of the SDXL tech report.
Add the use_quant_conv and use_post_quant_conv options to the AutoEncoderKL, as SD3's VAE does not have those layers. These changes might be considered unspecific to SD3, as diffusers has these options supported.
Uses get_qkv_linear to load the attention block in candle_nn::stable-diffusion::attention, as some weight of linear layer of VAE in official SD3 Medium safetensors follow the dimension convention of (channel, channel, 1, 1) instead of the regular (channel, channel) that is natually supported by nn::linear constructor.

These changes allows reusing existing CLIP and VAE implementations, but inevitably add complexity to existing codebase. @LaurentMazare Let me know if these intrusive changes are justified. We may consider alternatives like re-implementing VAE and CLIP from scratch.

On top of these changes, I added the support to flash-attention for MMDiT based on whether the feature flash-attn is enabled. Also done a simple performance benchmark on GPUs like 3090 Ti and 4090.

A side note is the T5 implementation on current main branch hasn't supported for FP16. I attempted to insert simple clampings within the FP16 dynamic range but it didn't work well on my GPUs. Looks like I need to wait for a more sophiscated implementation such as #2481. So for now, I use two different VarBuilders, one maps weights in safetensor into FP32 specifically for T5, the other for the rest compoents.

Add get_qkv_linear to handle different dimensionality in linears Add stable diffusion 3 example Add use_quant_conv and use_post_quant_conv for vae in stable diffusion adapt existing AutoEncoderKLConfig to the change add forward_until_encoder_layer to ClipTextTransformer rename sd3 config to sd3_medium in mmdit; minor clean-up Enable flash-attn for mmdit impl when the feature is enabled. Add sd3 example codebase add document crediting references pass the cargo fmt test pass the clippy test

LaurentMazare

Looks pretty good, thanks for adding this. Would you mind replacing the sample image with a jpg version? (the png version you attached takes almost 1MB which is not great for the repo size)

… accordingly.

Czxck001 · 2024-10-13T07:25:53Z

@LaurentMazare Thank you for reminding me this. The sample image has been replaced by a JPG. The original PNG should be excluded from Git objects after squash-merging.

candle-examples/examples/stable-diffusion-3/main.rs

candle-examples/Cargo.toml

candle-examples/examples/stable-diffusion-3/vae.rs

candle-transformers/src/models/mmdit/blocks.rs

LaurentMazare · 2024-10-13T20:09:09Z

Merged, thanks a lot!

Czxck001 added 3 commits October 12, 2024 16:38

fix typos

443824b

expose cfg_scale and time_shift as options

64c9181

LaurentMazare reviewed Oct 13, 2024

View reviewed changes

Replace the sample image with JPG version. Change image output format…

84e6463

… accordingly.

make meaningful error messages

c56c98b

LaurentMazare reviewed Oct 13, 2024

View reviewed changes

Czxck001 added 6 commits October 13, 2024 10:48

remove the tail-end assignment in sd3_vae_vb_rename

24691ad

remove the CUDA requirement

b8f0923

use default_value in clap args

457e188

add use_flash_attn to turn on/off flash-attn for MMDiT at runtime

50da3fe

resolve clippy errors and warnings

e37a1f6

use default_value_t

8f1339e

LaurentMazare approved these changes Oct 13, 2024

View reviewed changes

LaurentMazare added 2 commits October 13, 2024 21:53

Pin the web-sys dependency.

1fc6482

Clippy fix.

a2e4ecd

LaurentMazare merged commit ca7cf5c into huggingface:main Oct 13, 2024
10 checks passed

Czxck001 deleted the add-stable-diffusion-3-example branch October 13, 2024 20:09

Czxck001 mentioned this pull request Oct 13, 2024

Fix the guide to gain access to Stable Diffusion 3 Medium #2559

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Stable Diffusion 3 Example #2558

Add Stable Diffusion 3 Example #2558

Czxck001 commented Oct 13, 2024 •

edited

Loading

LaurentMazare left a comment

Czxck001 commented Oct 13, 2024

LaurentMazare commented Oct 13, 2024

Add Stable Diffusion 3 Example #2558

Add Stable Diffusion 3 Example #2558

Conversation

Czxck001 commented Oct 13, 2024 • edited Loading

LaurentMazare left a comment

Choose a reason for hiding this comment

Czxck001 commented Oct 13, 2024

LaurentMazare commented Oct 13, 2024

Czxck001 commented Oct 13, 2024 •

edited

Loading