Merge branch 'main' into erikk/outlines-core

huggingface · Oct 22, 2024 · 2ee917b · 2ee917b
2 parents dc66c5a + 157b2f8
commit 2ee917b
Show file tree

Hide file tree

Showing 6 changed files with 589 additions and 2 deletions.
diff --git a/_blog.yml b/_blog.yml
@@ -4847,4 +4847,24 @@
  tags:
  - structured generation
  - llm
- - open-source
+ - open-source
+- local: sd3-5
+ title: "🧨 Diffusers welcomes Stable Diffusion 3.5 Large"
+ author: diffusers
+ thumbnail: /blog/assets/sd3-5/thumbnail.png
+ date: October 22, 2024
+ tags:
+ - diffusers
+ - guide
+ - sd3-5
+- local: transformersjs-v3
+ title: "Transformers.js v3: WebGPU support, new models & tasks, and more…"
+ author: Xenova
+ thumbnail: /blog/assets/transformersjs-v3/thumbnail.png
+ date: October 22, 2024
+ tags:
+ - announcement
+ - transformers.js
+ - transformers
+ - javascript
+ - webgpu
diff --git a/assets/sd3-5/thumbnail.png b/assets/sd3-5/thumbnail.png
diff --git a/assets/transformersjs-v3/thumbnail.png b/assets/transformersjs-v3/thumbnail.png
diff --git a/s2s_endpoint.md b/s2s_endpoint.md
@@ -117,7 +117,7 @@ Using a custom docker image just requires a slightly different configuration, fe
 Pre-Steps
 1. Login: https://huggingface.co/login
 2. Request access to [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)
-3. Create a Fine-Graned Token: https://huggingface.co/settings/tokens/new?tokenType=fineGrained
+3. Create a Fine-Grained Token: https://huggingface.co/settings/tokens/new?tokenType=fineGrained
 ![Fine-Grained Token](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/s2s_endpoint/fine-grained-token.png)
  - Select access to gated repos
  - If you are using the API make sure to select permissions to Manage Inference Endpoints

diff --git a/sd3-5.md b/sd3-5.md
@@ -0,0 +1,220 @@
+---
+title: "Diffusers welcomes Stable Diffusion 3.5 Large"
+thumbnail: /blog/assets/sd3-5/thumbnail.png
+authors:
+- user: YiYiXu
+- user: a-r-r-o-w
+- user: dn6
+- user: sayakpaul
+- user: linoyts
+- user: multimodalart
+- user: OzzyGT
+- user: ariG23498
+---
+
+# 🧨 Diffusers welcomes Stable Diffusion 3.5 Large
+
+Stable Diffusion 3.5 is the improved variant of its predecessor, [Stable Diffusion 3](https://huggingface.co/blog/sd3). 
+As of today, the models are available on the Hugging Face Hub and can be used with 🧨 Diffusers.
+
+The release comes with [two checkpoints](https://huggingface.co/collections/stabilityai/stable-diffusion-35-671785cca799084f71fa2838):
+
+- A large (8B) model
+- A large (8B) timestep-distilled model enabling few-step inference
+
+In this post, we will focus on how to use Stable Diffusion 3.5 (SD3.5) with Diffusers, covering both inference and training.
+
+## Table Of Contents
+
+- [Architectural changes](#architectural-changes)
+- [Using SD3.5 with Diffusers](#using-sd35-with-diffusers)
+- [Performing inference with quantization](#running-inference-with-quantization)
+- [Training LoRAs with quantization](#training-loras-with-sd35-large-with-quantization)
+- [Using single-file loading](#using-single-file-loading-with-the-stable-diffusion-35-transformer)
+- [Important links](#important-links)
+
+## Architectural changes
+The transformer architecture of SD3.5 (large) is very similar to SD3 (medium), with the following changes:
+
+- QK normalization: For training large transformer models, [QK normalization](https://research.google/blog/scaling-vision-transformers-to-22-billion-parameters/) has now become a standard, and SD3.5 Large is no exception.
+- Dual attention layers: Instead of using single attention layers for each stream of modality in the MMDiT blocks, SD3.5 uses double attention layers.
+
+The rest of the details in terms of the text encoders, VAE, and noise scheduler stay exactly the same as in SD3 Medium. For more on SD3, we recommend checking out the [original paper](https://arxiv.org/abs/2403.03206).
+
+## Using SD3.5 with Diffusers
+Make sure you install the latest version of diffusers:
+
+```bash
+pip install -U diffusers
+```
+
+As the model is gated, before using it with diffusers, you first need to go to the [Stable Diffusion 3.5 Large Hugging Face page](link), fill in the form and accept the gate. 
+Once you are in, you need to log in so that your system knows you’ve accepted the gate. Use the command below to log in:
+
+```bash
+huggingface-cli login
+```
+
+The following snippet will download the 8B parameter version of SD3.5 in `torch.bfloat16` precision. 
+This is the format used in the original checkpoint published by Stability AI, and is the recommended way to run inference.
+
+```python
+import torch
+from diffusers import StableDiffusion3Pipeline
+
+pipe = StableDiffusion3Pipeline.from_pretrained(
+ "stabilityai/stable-diffusion-3.5-large", torch_dtype=torch.bfloat16
+).to("cuda")
+
+image = pipe(
+ prompt="a photo of a cat holding a sign that says hello world",
+ negative_prompt="",
+ num_inference_steps=40,
+ height=1024,
+ width=1024,
+ guidance_scale=4.5,
+).images[0]
+
+image.save("sd3_hello_world.png")
+```
+![hello_world_cat](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/sd3-5/hello_world_cat.png)
+
+The release also comes with a **“timestep-distilled”** model that eliminates classifier-free guidance and lets us generate images in fewer steps (typically in 4-8 steps). 
+```python
+import torch
+from diffusers import StableDiffusion3Pipeline
+
+pipe = StableDiffusion3Pipeline.from_pretrained(
+ "stabilityai/stable-diffusion-3.5-large-turbo", torch_dtype=torch.bfloat16
+).to("cuda")
+
+image = pipe(
+ prompt="a photo of a cat holding a sign that says hello world",
+ num_inference_steps=4,
+ height=1024,
+ width=1024,
+ guidance_scale=1.0,
+).images[0]
+
+image.save("sd3_hello_world.png")
+```
+![hello_world_cat_2](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/sd3-5/hello_world_cat_2.png)
+
+All the examples shown in our [SD3 blog post](https://huggingface.co/blog/sd3) and the [official Diffusers documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/stable_diffusion_3) should already work with SD3.5. 
+In particular, both of those resources dive deep into optimizing the memory requirements to run inference. 
+Since SD3.5 Large is significantly larger than SD3 Medium, memory optimization becomes crucial to allow inference on consumer interfaces. 
+
+## Running inference with quantization
+Diffusers natively support working with [`bitsandbytes`](https://github.com/bitsandbytes-foundation/bitsandbytes) quantization, which otimizes memory even more. 
+
+First, make sure to install all the libraries necessary:
+```bash
+pip install -Uq git+https://github.com/huggingface/transformers@main
+pip install -Uq bitsandbytes
+```
+Then load the transformer in [“NF4” precision](https://huggingface.co/blog/4bit-transformers-bitsandbytes):
+```python
+from diffusers import BitsAndBytesConfig, SD3Transformer2DModel
+import torch
+
+model_id = "stabilityai/stable-diffusion-3.5-large"
+nf4_config = BitsAndBytesConfig(
+ load_in_4bit=True,
+ bnb_4bit_quant_type="nf4",
+ bnb_4bit_compute_dtype=torch.bfloat16
+)
+model_nf4 = SD3Transformer2DModel.from_pretrained(
+ model_id,
+ subfolder="transformer",
+ quantization_config=nf4_config,
+ torch_dtype=torch.bfloat16
+)
+```
+And now, we’re ready to run inference:
+```python
+from diffusers import StableDiffusion3Pipeline
+
+pipeline = StableDiffusion3Pipeline.from_pretrained(
+ model_id, 
+ transformer=model_nf4,
+ torch_dtype=torch.bfloat16
+)
+pipeline.enable_model_cpu_offload()
+
+prompt = "A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus, basking in a river of melted butter amidst a breakfast-themed landscape. It features the distinctive, bulky body shape of a hippo. However, instead of the usual grey skin, the creature's body resembles a golden-brown, crispy waffle fresh off the griddle. The skin is textured with the familiar grid pattern of a waffle, each square filled with a glistening sheen of syrup. The environment combines the natural habitat of a hippo with elements of a breakfast table setting, a river of warm, melted butter, with oversized utensils or plates peeking out from the lush, pancake-like foliage in the background, a towering pepper mill standing in for a tree. As the sun rises in this fantastical world, it casts a warm, buttery glow over the scene. The creature, content in its butter river, lets out a yawn. Nearby, a flock of birds take flight"
+image = pipeline(
+ prompt=prompt,
+ negative_prompt="",
+ num_inference_steps=28,
+ guidance_scale=4.5,
+ max_sequence_length=512,
+).images[0]
+image.save("whimsical.png")
+```
+![happy_hippo](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/sd3-5/hippo.png)
+
+You can control other knobs in the `BitsAndBytesConfig`. Refer to the [documentation](https://huggingface.co/docs/diffusers/main/en/quantization/bitsandbytes) for details. 
+
+It is also possible to directly load a model quantized with the same `nf4_config` as above. 
+This is particularly helpful for machines with low RAM. Refer to [this Colab Notebook](https://colab.research.google.com/drive/1nK5hOCPY3RoGi0yqddscGdKvo1r-rHqE?usp=sharing) for an end-to-end example.
+
+## Training LoRAs with SD3.5 Large with quantization
+Thanks to libraries like `bitsandbytes` and `peft`, it is possible to fine-tune large models like SD3.5 Large on consumer GPU cards having 24GBs of VRAM. It is already possible to leverage our existing [SD3 training script](https://huggingface.co/blog/sd3#dreambooth-and-lora-fine-tuning) for training LoRAs. 
+The below training command already works:
+
+```bash
+accelerate launch train_dreambooth_lora_sd3.py \
+ --pretrained_model_name_or_path="stabilityai/stable-diffusion-3.5-large" \
+ --dataset_name="Norod78/Yarn-art-style" \
+ --output_dir="yart_art_sd3-5_lora" \
+ --mixed_precision="bf16" \
+ --instance_prompt="Frog, yarn art style" \
+ --caption_column="text"\
+ --resolution=768 \
+ --train_batch_size=1 \
+ --gradient_accumulation_steps=1 \
+ --learning_rate=4e-4 \
+ --report_to="wandb" \
+ --lr_scheduler="constant" \
+ --lr_warmup_steps=0 \
+ --max_train_steps=700 \
+ --rank=16 \
+ --seed="0" \
+ --push_to_hub
+```
+However, to make it work with quantization, we need to tweak a couple of knobs. Below, we provide pointers on how to do that:
+
+- We initialize `transformer` either with a quantization config or load a quantized checkpoint directly.
+- Then, we prepare it by using the `prepare_model_for_kbit_training()` from `peft`.
+- The rest of the process remains the same, thanks to `peft`'s strong support for `bitsandbytes`!
+
+Refer to [this example](https://gist.github.com/sayakpaul/05afd428bc089b47af7c016e42004527) script for a fuller example.
+
+## Using single-file loading with the Stable Diffusion 3.5 Transformer
+You can load the Stable Diffusion 3.5 Transformer model using the original checkpoint files published by Stability AI with the `from_single_file` method:
+```python
+import torch
+from diffusers import SD3Transformer2DModel, StableDiffusion3Pipeline
+
+transformer = SD3Transformer2DModel.from_single_file(
+ "https://huggingface.co/stabilityai/stable-diffusion-3.5-large-turbo/blob/main/sd3.5_large.safetensors",
+ torch_dtype=torch.bfloat16,
+)
+pipe = StableDiffusion3Pipeline.from_pretrained(
+ "stabilityai/stable-diffusion-3.5-large",
+ transformer=transformer,
+ torch_dtype=torch.bfloat16,
+)
+pipe.enable_model_cpu_offload()
+image = pipe("a cat holding a sign that says hello world").images[0]
+image.save("sd35.png") 
+```
+### Important links
+- Stable Diffusion 3.5 Large [collection](https://huggingface.co/collections/stabilityai/stable-diffusion-35-671785cca799084f71fa2838) on the Hub
+- Official Diffusers [documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/stable_diffusion_3) for Stable Diffusion 3.5
+- [Colab Notebook](https://colab.research.google.com/drive/1nK5hOCPY3RoGi0yqddscGdKvo1r-rHqE?usp=sharing) to run inference with quantization
+- [Training LoRAs](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_sd3.md)
+- Stable Diffusion 3 [paper](https://arxiv.org/abs/2403.03206)
+- Stable Diffusion 3 [blog post](https://huggingface.co/blog/sd3)
+
+_Acknowledgements: [Daniel Frank](https://www.pexels.com/@fr3nks/) for the background photo used in the thumbnail of this blog post. Thanks to [Pedro Cuenca](https://huggingface.co/pcuenq) and [Tom Aarsen](https://huggingface.co/tomaarsen) for their reviews on the post draft._