Skip to content

[Feat]: Attention backend selection for Diffusers #1090

@zzlol63

Description

@zzlol63

Describe your use-case.

The latest version of Diffusers supports being able to configure or select a specific attention backend such as FlashAttention-2/FlashAttention-3 (which supports backward pass).

OneTrainer could potentially benefit in performance if this was togglable using the provided API as per the documentation below using a context manager:
https://huggingface.co/docs/diffusers/main/en/optimization/attention_backends

with attention_backend("flash"):
    # training/inference goes here

It is also possible to force OneTrainer to use a particular attention backend using the following environment variable (example using FA2):
DIFFUSERS_ATTN_BACKEND=flash

The feature itself is marked as experimental and still have yet to find out the potential drawbacks of using this for training. So far I have observed that for training a Chroma LoRA with batch size 3 on an RTX 5090, the seconds per iteration reduced from 9.5s/it to 6.4s/it on my machine.

There are checks associated with each of the backends (ie. shape, dtypes, etc) which are skipped by default. Depending on selected training types in OneTrainer, stuff can easy break if not aligned. It's possible to enable these checks in Diffusers, but that incurs extra overhead per attention-call (but can be good as a quick sanity check to validate the current configuration is sane):
DIFFUSERS_ATTN_CHECKS=1

I tried enabling this flag and it failed one of the assertions:
Attention mask must match the key's second to last dimension.

Probably needs some more investigation.

What would you like to see as a solution?

Look into potentially adding a dropdown to select supported attention backends (or documenting something maybe and adding the necessary caveats or tested configurations/models).

This feature is marked as experimental currently by HuggingFace.

Have you considered alternatives? List them here.

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions