You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
INT4 weight compression is widely used to compress LLM models and optimize the model inference. OpenVINO effectively optimizes the inference of models with INT4 weights, which results in significantly faster model inference.
Feature request is to add INT4 weights compression for torch.fx.GraphModule models in nncf.compress_weights to enable the creation of models with INT4 compressed weights and inference them using torch.compile with the OpenVINO backend.
Feature Use Case
import torch
import nncf
# initialize a floating point model
float_model = M().eval()
# program capture
# NOTE: this API will be updated to torch.export API in the future, but the captured result should mostly stay the same
model = capture_pre_autograd_graph(float_model, *example_inputs)
# compress weights
compressed_model = nncf.compress_weights(model, mode=nncf.CompressWeightsMode.INT4_ASYM)
# compile quantized model with OpenVINO backend
compiled_model = torch.compile(compressed_model, backend='openvino')
Are you going to submit a PR?
Yes I'd like to help by submitting a PR!
The text was updated successfully, but these errors were encountered:
🚀 Feature request
INT4 weight compression is widely used to compress LLM models and optimize the model inference. OpenVINO effectively optimizes the inference of models with INT4 weights, which results in significantly faster model inference.
Feature request is to add INT4 weights compression for
torch.fx.GraphModule
models innncf.compress_weights
to enable the creation of models with INT4 compressed weights and inference them using torch.compile with the OpenVINO backend.Feature Use Case
Are you going to submit a PR?
The text was updated successfully, but these errors were encountered: