[Feat]: Attention backend selection for Diffusers

### Describe your use-case.

The latest version of Diffusers supports being able to configure or select a specific attention backend such as FlashAttention-2/FlashAttention-3 (which supports backward pass).

OneTrainer could potentially benefit in performance if this was togglable using the provided API as per the documentation below using a context manager:
https://huggingface.co/docs/diffusers/main/en/optimization/attention_backends

```
with attention_backend("flash"):
    # training/inference goes here
```

It is also possible to force OneTrainer to use a particular attention backend using the following environment variable (example using FA2):
`DIFFUSERS_ATTN_BACKEND=flash`

The feature itself is marked as experimental and still have yet to find out the potential drawbacks of using this for training. So far I have observed that for training a Chroma LoRA with batch size 3 on an RTX 5090, the seconds per iteration reduced from 9.5s/it to 6.4s/it on my machine.

There are checks associated with each of the backends (ie. shape, dtypes, etc) which are skipped by default. Depending on selected training types in OneTrainer, stuff can easy break if not aligned. It's possible to enable these checks in Diffusers, but that incurs extra overhead per attention-call (but can be good as a quick sanity check to validate the current configuration is sane):
`DIFFUSERS_ATTN_CHECKS=1`

I tried enabling this flag and it failed one of the assertions:
`Attention mask must match the key's second to last dimension.`

Probably needs some more investigation.

### What would you like to see as a solution?

Look into potentially adding a dropdown to select supported attention backends (or documenting something maybe and adding the necessary caveats or tested configurations/models).

This feature is marked as experimental currently by HuggingFace.

### Have you considered alternatives? List them here.

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feat]: Attention backend selection for Diffusers #1090

Describe your use-case.

What would you like to see as a solution?

Have you considered alternatives? List them here.

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[Feat]: Attention backend selection for Diffusers #1090

Description

Describe your use-case.

What would you like to see as a solution?

Have you considered alternatives? List them here.

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions