You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Warning: 1Torch was not compiled with flash attention.
First of all, let me tell you a good news. Failure usually does not affect the program running, but it is slower.
This warning is caused by the fact that after torch=2.2 update, flash attention V2 needs to be started as the optimal mechanism, but it is not successfully started.
Usually, the order of function calls is FlashAttention > Memory-Efficient Attention(xformers) > PyTorch C++ implementation(math)
(I don't understand why it is designed this way, and the meaning is completely unclear from the warning. I hope the official next version will improve it)
But the pits I want to solve are the following places:
It is supported in pytroch and is the first choice. The logic is that this Warning will be issued as long as flashAttentionV2 fails. (Some people have tested and found that flashAttentionV2 has not improved much)
The hardware support is at least RTX 30 or above. FlashAttention only supports Ampere GPUs or newer. In other words, it can run on 3060.
There is still a small possibility that the environment cuda version and the compiled cuda version are incompatible. The official version of torch is 12.1 (torch2.* +cu121).
Loading pipeline components...: 100%|██████████| 7/7 [00:02<00:00, 2.48it/s]
0%| | 0/50 [00:00<?, ?it/s]D:\ProgramData\envs\pytorch\Lib\site-packages\diffusers\models\attention_processor.py:1279: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.)
hidden_states = F.scaled_dot_product_attention(
100%|██████████| 50/50 [00:05<00:00, 8.44it/s]
The text was updated successfully, but these errors were encountered: