-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWQ is not working #1240
Comments
Is it supposed to work on Gaudi? |
The primary goal is to get llama405b on a single gaudi node I had read originally that huggingface TGI was supposed to use awq, but i was unable to use any sort of quantization method at all, provided by huggingface quants, including GPTQ, uint4, etc, its just spread amongst different issues. |
I think GPTQ should work on Gaudi no? |
no, neither generating quantized models with the intel neural compressor nor does https://huggingface.co/hugging-quants/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4 work on tgi_gaudi, nor does fp8 work with INC on a single node. |
System Info
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
.
Expected behavior
.
The text was updated successfully, but these errors were encountered: