Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model size after quantization #1701

Open
TaylorYangX opened this issue Feb 11, 2025 · 3 comments
Open

Model size after quantization #1701

TaylorYangX opened this issue Feb 11, 2025 · 3 comments
Labels
quantize question Further information is requested

Comments

@TaylorYangX
Copy link

TaylorYangX commented Feb 11, 2025

Why is the size relationship of the model unreasonable after I use these three quantization methods on the same model?

from torchao.quantization import quantize_, int8_weight_only
quantize_(new_model, int8_weight_only())


# from torchao.quantization import quantize_, int8_dynamic_activation_int8_weight
# quantize_(new_model, int8_dynamic_activation_int8_weight())


# from torchao.quantization import int8_dynamic_activation_int4_weight
# quantize_(new_model, int8_dynamic_activation_int4_weight())

the result:

20786584 Feb  5 13:46 a8w4SWaT.pte
20373272 Feb  5 13:45 a8w8SWaT.pte
29685120 Oct  5 13:12 pytorch_checkpoint.pth
20262664 Feb  5 13:44 w8onlySWaT.pte

Because theoretically, the model after using the A8W4 quantization method should be the smallest, but the actual results are different

@drisspg drisspg added question Further information is requested quantize labels Feb 12, 2025
@jerryzh168
Copy link
Contributor

int8_dynamic_activation_int4_weight is using q/dq representation for quantized model (using float_weight -> q -> dq) (https://pytorch.org/tutorials/prototype/pt2e_quant_ptq.html#convert-the-calibrated-model-to-a-quantized-model), it's supposed to be lowered to executorch, and you'll only see size reduction after delegation I think. cc @mcr229 @digantdesai @metascroy

@supriyar
Copy link
Contributor

@jerryzh168 that doesn't explain the size for the other configurations too i.e int8 weight only and int8_dynamic_activation_int8_weight?

@jerryzh168
Copy link
Contributor

jerryzh168 commented Feb 12, 2025

@supriyar assuming you meant the size reduction is less than 2x, I think it depends on the actual model whether the size reduction for int8wo and int8dyn requires explanation, since (1) we only quantize linear weights, and there could be other weights that's not quantized (2) we use per channel weight quant by default for both int8wo and int8dyn, so there will be space needed to store scale/zero_point as well, which means the size reduction will be less than 2x

also int8_dynamic_activation_int4_weight is using int8 as target_dtype, which explains why they all have similar size I think

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
quantize question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants