[Torch FX] INT4 data-free weights compression #3005

alexsu52 · 2024-10-09T06:44:26Z

🚀 Feature request

INT4 weight compression is widely used to compress LLM models and optimize the model inference. OpenVINO effectively optimizes the inference of models with INT4 weights, which results in significantly faster model inference.

Feature request is to add INT4 weights compression for torch.fx.GraphModule models in nncf.compress_weights to enable the creation of models with INT4 compressed weights and inference them using torch.compile with the OpenVINO backend.

Feature Use Case

import torch
import nncf

# initialize a floating point model
float_model = M().eval()

# program capture
# NOTE: this API will be updated to torch.export API in the future, but the captured result should mostly stay the same
model = capture_pre_autograd_graph(float_model, *example_inputs)

# compress weights
compressed_model = nncf.compress_weights(model, mode=nncf.CompressWeightsMode.INT4_ASYM)

# compile quantized model with OpenVINO backend
compiled_model = torch.compile(compressed_model, backend='openvino')

Are you going to submit a PR?

Yes I'd like to help by submitting a PR!

The text was updated successfully, but these errors were encountered:

alexsu52 added the enhancement New feature or request label Oct 9, 2024

alexsu52 changed the title ~~[Torch FX] Support IN4 weights compression~~ [Torch FX] Support INT4 weights compression Oct 9, 2024

alexsu52 changed the title ~~[Torch FX] Support INT4 weights compression~~ [Torch FX] INT4 weights compression Oct 9, 2024

alexsu52 self-assigned this Oct 9, 2024

alexsu52 mentioned this issue Oct 9, 2024

[TorchFX] Torch FX/PyTorch 2 Export Quantization #2766

Open

1 task

MaximProshin changed the title ~~[Torch FX] INT4 weights compression~~ [Torch FX] INT4 data-free weights compression Oct 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Torch FX] INT4 data-free weights compression #3005

[Torch FX] INT4 data-free weights compression #3005

alexsu52 commented Oct 9, 2024

[Torch FX] INT4 data-free weights compression #3005

[Torch FX] INT4 data-free weights compression #3005

Comments

alexsu52 commented Oct 9, 2024

🚀 Feature request

Feature Use Case

Are you going to submit a PR?