You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@supriyar assuming you meant the size reduction is less than 2x, I think it depends on the actual model whether the size reduction for int8wo and int8dyn requires explanation, since (1) we only quantize linear weights, and there could be other weights that's not quantized (2) we use per channel weight quant by default for both int8wo and int8dyn, so there will be space needed to store scale/zero_point as well, which means the size reduction will be less than 2x
also int8_dynamic_activation_int4_weight is using int8 as target_dtype, which explains why they all have similar size I think
Why is the size relationship of the model unreasonable after I use these three quantization methods on the same model?
the result:
Because theoretically, the model after using the A8W4 quantization method should be the smallest, but the actual results are different
The text was updated successfully, but these errors were encountered: