-
-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] amd has updated thair fa2 fork quite some time ago #272
Comments
It's true that FA2 does have AMD ROCm support now (I've used it for model finetuning purposes), however there are several issues that will limit its usability:
I think overall it would be a fairly niche use case for power users that probably would be too low yield to integrate into the automated TabbyAPI installation. Perhaps there could be a way to detect a working AMD+FA2 setup and not force compatibility mode. |
Im not worried about automatic installing, but tabbyapi should not refuse to use fa2 just because its an amd gpu. I think on amd we should not try to install anything but if fa2 is installed on an amd system we should assume its the correct amd compiled version and it should be used. This could be achived by checking for hip, trying to import flash attn and then useing it if that works or choosing the compat mode if this results in a ModuleNotFoundError exception |
Btw flash attn dose support rdna3, so there is support for at least some consumer gpus |
FA2 being supported on rocm is a big step forward for the AMD side of AI. However, the important thing for FA2 and tabby is if there's paged attention support. This allows for use of the batching engine. Iirc the rocm version has batching but I neither have an AMD card nor wheels to test. Therefore, this will have to be a PRed feature with the goal of autodetection |
OS
Windows
GPU Library
CUDA 12.x
Python version
3.12
Describe the bug
tabbyAPI/backends/exllamav2/utils.py
Line 23 in bd16681
Reproduction steps
upstream https://github.com/Dao-AILab/flash-attention contains amd support by now
Expected behavior
AMD cdna gpus should be considered supported just as well as ampere+
Logs
No response
Additional context
No response
Acknowledgements
The text was updated successfully, but these errors were encountered: