Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] amd has updated thair fa2 fork quite some time ago #272

Open
4 tasks done
IMbackK opened this issue Jan 13, 2025 · 4 comments
Open
4 tasks done

[BUG] amd has updated thair fa2 fork quite some time ago #272

IMbackK opened this issue Jan 13, 2025 · 4 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@IMbackK
Copy link

IMbackK commented Jan 13, 2025

OS

Windows

GPU Library

CUDA 12.x

Python version

3.12

Describe the bug

"(30 series) or newer. AMD GPUs are not supported."
is in appropriate as a blanket statement and condition, fa2 is up to date and works fine on amd gpus (CDNA only) with exllamav2

Reproduction steps

upstream https://github.com/Dao-AILab/flash-attention contains amd support by now

Expected behavior

AMD cdna gpus should be considered supported just as well as ampere+

Logs

No response

Additional context

No response

Acknowledgements

  • I have looked for similar issues before submitting this one.
  • I have read the disclaimer, and this issue is related to a code bug. If I have a question, I will use the Discord server.
  • I understand that the developers have lives and my issue will be answered when possible.
  • I understand the developers of this program are human, and I will ask my questions politely.
@IMbackK IMbackK added the bug Something isn't working label Jan 13, 2025
@DocShotgun
Copy link
Member

It's true that FA2 does have AMD ROCm support now (I've used it for model finetuning purposes), however there are several issues that will limit its usability:

  1. There are no official wheels. While we could build wheels for ROCm, the last time I tried when I was working on training, the build time exceeded the maximum allowed runtime of 6 hours for the free GH actions runner. Perhaps there would be a way to optimize around this. Otherwise would need to be self-built by power users.
  2. Only certain AMD GPUs are supported and there would need to be some kind of architecture check for this as well. It's not supported on consumer-grade AMD GPUs IIRC, which is rather limiting. For larger enterprise server type setups, there are other more ideal inference backends besides TabbyAPI/ExLlamaV2.

I think overall it would be a fairly niche use case for power users that probably would be too low yield to integrate into the automated TabbyAPI installation. Perhaps there could be a way to detect a working AMD+FA2 setup and not force compatibility mode.

@IMbackK
Copy link
Author

IMbackK commented Jan 14, 2025

Im not worried about automatic installing, but tabbyapi should not refuse to use fa2 just because its an amd gpu. I think on amd we should not try to install anything but if fa2 is installed on an amd system we should assume its the correct amd compiled version and it should be used. This could be achived by checking for hip, trying to import flash attn and then useing it if that works or choosing the compat mode if this results in a ModuleNotFoundError exception

@IMbackK
Copy link
Author

IMbackK commented Jan 14, 2025

Btw flash attn dose support rdna3, so there is support for at least some consumer gpus

@bdashore3
Copy link
Member

FA2 being supported on rocm is a big step forward for the AMD side of AI.

However, the important thing for FA2 and tabby is if there's paged attention support. This allows for use of the batching engine.

Iirc the rocm version has batching but I neither have an AMD card nor wheels to test.

Therefore, this will have to be a PRed feature with the goal of autodetection

@bdashore3 bdashore3 added the help wanted Extra attention is needed label Jan 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants