Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Torch FX] INT4 data-free weights compression #3005

Open
1 task
alexsu52 opened this issue Oct 9, 2024 · 0 comments
Open
1 task

[Torch FX] INT4 data-free weights compression #3005

alexsu52 opened this issue Oct 9, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@alexsu52
Copy link
Contributor

alexsu52 commented Oct 9, 2024

🚀 Feature request

INT4 weight compression is widely used to compress LLM models and optimize the model inference. OpenVINO effectively optimizes the inference of models with INT4 weights, which results in significantly faster model inference.

Feature request is to add INT4 weights compression for torch.fx.GraphModule models in nncf.compress_weights to enable the creation of models with INT4 compressed weights and inference them using torch.compile with the OpenVINO backend.

Feature Use Case

import torch
import nncf

# initialize a floating point model​
float_model = M().eval()​

# program capture​
# NOTE: this API will be updated to torch.export API in the future,​ but the captured result should mostly stay the same​
model = capture_pre_autograd_graph(float_model, *example_inputs)

# compress weights​
compressed_model = nncf.compress_weights(model, mode=nncf.CompressWeightsMode.INT4_ASYM)

# compile quantized model with OpenVINO bac​kend
compiled_model = torch.compile(compressed_model, backend='openvino')

Are you going to submit a PR?

  • Yes I'd like to help by submitting a PR!
@alexsu52 alexsu52 added the enhancement New feature or request label Oct 9, 2024
@alexsu52 alexsu52 changed the title [Torch FX] Support IN4 weights compression [Torch FX] Support INT4 weights compression Oct 9, 2024
@alexsu52 alexsu52 changed the title [Torch FX] Support INT4 weights compression [Torch FX] INT4 weights compression Oct 9, 2024
@alexsu52 alexsu52 self-assigned this Oct 9, 2024
@MaximProshin MaximProshin changed the title [Torch FX] INT4 weights compression [Torch FX] INT4 data-free weights compression Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant