Model size after quantization #1701

TaylorYangX · 2025-02-11T19:32:29Z

Why is the size relationship of the model unreasonable after I use these three quantization methods on the same model?

from torchao.quantization import quantize_, int8_weight_only
quantize_(new_model, int8_weight_only())


# from torchao.quantization import quantize_, int8_dynamic_activation_int8_weight
# quantize_(new_model, int8_dynamic_activation_int8_weight())


# from torchao.quantization import int8_dynamic_activation_int4_weight
# quantize_(new_model, int8_dynamic_activation_int4_weight())

the result:

20786584 Feb  5 13:46 a8w4SWaT.pte
20373272 Feb  5 13:45 a8w8SWaT.pte
29685120 Oct  5 13:12 pytorch_checkpoint.pth
20262664 Feb  5 13:44 w8onlySWaT.pte

Because theoretically, the model after using the A8W4 quantization method should be the smallest, but the actual results are different

jerryzh168 · 2025-02-12T00:58:29Z

int8_dynamic_activation_int4_weight is using q/dq representation for quantized model (using float_weight -> q -> dq) (https://pytorch.org/tutorials/prototype/pt2e_quant_ptq.html#convert-the-calibrated-model-to-a-quantized-model), it's supposed to be lowered to executorch, and you'll only see size reduction after delegation I think. cc @mcr229 @digantdesai @metascroy

supriyar · 2025-02-12T05:41:53Z

@jerryzh168 that doesn't explain the size for the other configurations too i.e int8 weight only and int8_dynamic_activation_int8_weight?

jerryzh168 · 2025-02-12T08:53:08Z

@supriyar assuming you meant the size reduction is less than 2x, I think it depends on the actual model whether the size reduction for int8wo and int8dyn requires explanation, since (1) we only quantize linear weights, and there could be other weights that's not quantized (2) we use per channel weight quant by default for both int8wo and int8dyn, so there will be space needed to store scale/zero_point as well, which means the size reduction will be less than 2x

also int8_dynamic_activation_int4_weight is using int8 as target_dtype, which explains why they all have similar size I think

drisspg added question Further information is requested quantize labels Feb 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model size after quantization #1701

Model size after quantization #1701

TaylorYangX commented Feb 11, 2025 •

edited by drisspg

Loading

jerryzh168 commented Feb 12, 2025

supriyar commented Feb 12, 2025

jerryzh168 commented Feb 12, 2025 •

edited

Loading

Model size after quantization #1701

Model size after quantization #1701

Comments

TaylorYangX commented Feb 11, 2025 • edited by drisspg Loading

jerryzh168 commented Feb 12, 2025

supriyar commented Feb 12, 2025

jerryzh168 commented Feb 12, 2025 • edited Loading

TaylorYangX commented Feb 11, 2025 •

edited by drisspg

Loading

jerryzh168 commented Feb 12, 2025 •

edited

Loading