Avoid attention masks for Qwen and Chroma #1109

dxqb · 2025-11-08T14:47:45Z

torch SDPA automatically uses a flash attention algorithm if possible:

if no attention mask is used, because flash attention doesn't support masks
currently: if linux, but this PR, Add flash-attn support for Windows #1107 or an upcoming torch version might change that

While torch automatically uses flash attention if there is no attention mask given at all, it does not recognize it if there is a no-op attention mask (mask is given, but no tokens are masked).

This PR detects that and uses no mask instead of a no-op mask, resulting in a significant speed-up of 20-25% in those cases. This is always the case if batch size is 1, and less often if batch size > 1.

		Qwen bs1 1328 px RTX 6000 Ada	Chroma bs2 512 px RTX 4070
OneTrainer	baseline	8.6	2.3
OneTrainer	avoiding masks	6.6	1.85
OneTrainer	+compiled	5.1
OneTrainer	+int8	3.4

This in principle can be combined with other upcoming features that improve performance further, see last two lines.

thank you to @FurkanGozukara for pointing out this speed difference

only implemented for Qwen and Chroma. Hunyuan, HiDream and Pixart appear to use attention masks as well

dxqb · 2025-11-08T15:53:15Z

* [x]  only implemented for Qwen and Chroma. Hunyuan, HiDream and Pixart appear to use attention masks as well

It doesn't make sense for now to implement this PR for the other models, because they all use a fixed sequence length. So the attention mask is never no-op.

Nerogar#1109

zzlol63 · 2025-11-09T01:50:59Z

Looks good, just did a quick test with #1107 to include the FlashAttention code path on Windows and am seeing a speedup with Chroma, moreso when the batch size is 1 to avoid the padded text token scenario that requires attention masking.

#chroma LoRA 24GB + bs2 + res 1024
before: 4.8s/it
after: 4.2-4.8s/it (fluctuating across steps)

#chroma LoRA 24GB + bs1 + res 1024
before: 2.3s/it
after: 1.5s/it

It would be more advantageous to train with batch size 1 and use accumulation steps to make up for reduction to take advantage of the performance gains from FlashAttention.

DarkViewAI · 2025-11-12T23:05:54Z

@dxqb i just tested this, and a previous commit, it seems avoid attention mask starting giving me banding issues in my images, like those long line artifacts.

using the default config for qwen in onetrainer -

qwen lora 24gb preset

i did not have these on a previous commit

zzlol63 · 2025-11-13T01:41:22Z

@dxqb Is the issue due to the removal of the tensor multiplication on the mask for Qwen?

dxqb · 2025-11-13T08:54:23Z

@dxqb i just tested this, and a previous commit, it seems avoid attention mask starting giving me banding issues in my images, like those long line artifacts.

it's not possible that this PR is the reason for this. Please look for other reasons, or make a direct comparison.

This PR is mathematically identical before and after. In theory, and in tests:
Because the Qwen diffusers pipeline has bugs regarding attention masks, I have written test code that confirms this the following way:
The same code that is used for training, is used for sampling images instead. I sample 4 different prompts, long and short. Once, each image separately with batch size 1, and once in a batch of 4 using an attention mask. The result is pixel-identical.

dxqb · 2025-11-18T11:52:06Z

@dxqb i just tested this, and a previous commit, it seems avoid attention mask starting giving me banding issues in my images, like those long line artifacts.

was this generated with a lightning 8-step or 4-step LoRA?

DarkViewAI · 2025-11-18T16:08:06Z

@dxqb hi no this was onetrainer sample, but the lora did the same in comfyui

DarkViewAI · 2025-11-18T16:08:25Z

i've just been using an older commit * and it works fine now

dxqb · 2025-11-18T16:10:03Z

@dxqb hi no this was onetrainer sample, but the lora did the same in comfyui

I've never seen any such artifact in a OneTrainer sample. If you can reproduce this please open an issue or join the Discord to show there.
It is a known issue with Lightning LoRA in Comfy, but with and without any OneTrainer LoRA, just as an artifact of Lightning.

avoid attention mask

9159d24

This was referenced Nov 8, 2025

[Feat]: Splitting batched attention #1110

Open

New quant types, activation quantization #1034

Merged

dxqb marked this pull request as ready for review November 8, 2025 15:53

dxqb added the merging last steps before merge label Nov 8, 2025

dxqb changed the title ~~Avoid attention masks~~ Avoid attention masks for Qwen and Chroma Nov 8, 2025

dxqb added a commit to dxqb/OneTrainer that referenced this pull request Nov 8, 2025

disable bug workaround - can currently not be reproduced and because of

9559ebf

Nerogar#1109

dxqb merged commit ccc0501 into Nerogar:master Nov 9, 2025
1 check passed

dxqb deleted the avoid_attn_mask branch November 9, 2025 12:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Avoid attention masks for Qwen and Chroma #1109

Avoid attention masks for Qwen and Chroma #1109

Uh oh!

dxqb commented Nov 8, 2025 •

edited

Loading

Uh oh!

dxqb commented Nov 8, 2025

Uh oh!

zzlol63 commented Nov 9, 2025 •

edited

Loading

Uh oh!

Uh oh!

DarkViewAI commented Nov 12, 2025

Uh oh!

zzlol63 commented Nov 13, 2025 •

edited

Loading

Uh oh!

dxqb commented Nov 13, 2025

Uh oh!

dxqb commented Nov 18, 2025

Uh oh!

DarkViewAI commented Nov 18, 2025

Uh oh!

DarkViewAI commented Nov 18, 2025 •

edited

Loading

Uh oh!

dxqb commented Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Avoid attention masks for Qwen and Chroma #1109

Avoid attention masks for Qwen and Chroma #1109

Uh oh!

Conversation

dxqb commented Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dxqb commented Nov 8, 2025

Uh oh!

zzlol63 commented Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

DarkViewAI commented Nov 12, 2025

Uh oh!

zzlol63 commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dxqb commented Nov 13, 2025

Uh oh!

dxqb commented Nov 18, 2025

Uh oh!

DarkViewAI commented Nov 18, 2025

Uh oh!

DarkViewAI commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dxqb commented Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dxqb commented Nov 8, 2025 •

edited

Loading

zzlol63 commented Nov 9, 2025 •

edited

Loading

zzlol63 commented Nov 13, 2025 •

edited

Loading

DarkViewAI commented Nov 18, 2025 •

edited

Loading