Can i use NNCF to do INT4 AWQ on a ONNX/DirectML LLAMA2-7B model? #2621
Replies: 1 comment
-
@vmadananth , NNCF can compress llama2-7b, but as far as I understand you refer to some modified version of llama2 in Olive which we never tried. If your goal is to run this model on OpenVINO then I think you can try to convert it to OV IR first and then compress by NNCF. You can find some examples here (https://github.com/openvinotoolkit/nncf/tree/develop/examples/llm_compression/openvino) or in OV notebooks (https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-chatbot, https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-question-answering). |
Beta Was this translation helpful? Give feedback.
-
I want to use NNCF to quantize LLAMA2-7B model which runs on ONNX and DirectML https://github.com/microsoft/Olive/tree/main/examples/directml/llama_v2.
Is this supported?
Beta Was this translation helpful? Give feedback.
All reactions