Advantage of the 4-bit Quantization #4

amitsrivastava78 · 2019-06-12T06:52:45Z

Hi @submission2019 ,
First of all i would like to congratulate you guys for coming up with this paper and opening the github project for the analysis. I have gone though your paper and github project deeply, and i would like to know the following : -

What is the advantage of this approach over 8-bit quantization ? Since all the operation should be byte aligned that mean mathematical operations should least be 8-bit also storage part also seems to be 8-bit aligned so i can not understand where the advantage lies in doing 4-bit quantization ? Also i can see there is a drop of accuracy of about 2~3% compared to 8-bit quantization.

So may be there is a bigger picture which i am not able to see, can you guys please point me to the right direction.

Regards
Amit

submission2019 · 2019-06-12T12:07:17Z

Hello.
The advantage of 4-bit weights and activations due to 2x reduction in bandwidth. Lot's of neural network workloads are bandwidth bound, reducing amount of bits increase throughput and reduces power consumption.

Of course in order to benefit from 4-bit quantization we need dedicated HW that supports manipulations with resolution lower than byte(8bit). Some HW vendors already suggest experimental HW/features for enthusiasts to experiment with int4. For example NVidia added support of int4/uint4 datatype as part of Cuda10 TensorCores HW.
On other hand a lot of academical and industrial research focusing on suggesting methods that will bring accuracy of int4 inference near to int8. The goal of our work to suggest and evaluation such methods that will allow int4 inference of convolutional neural networks with relatively small degradation of accuracy.

amitsrivastava78 · 2019-06-14T10:37:34Z

@submission2019 , @ynahshan , thanks for pointing me to the right direction. The paper looks promising , have you thought about commercializing this solution in any product?
Also using your algo on mobilenet , accuracy is very less, can you throw some light on this.

Regards
Amit

submission2019 · 2019-06-16T08:24:44Z

Hi.
We didn't tried to apply our methods on mobilenet. I don't know what are the reason for poor results you observe. It could be related to depth wise convolution that mobilinet mostly consist. Unfortunately with diversity of deep learning models it often requires to analyse the model and fine tune quantization methods for specific model.

amitsrivastava78 · 2019-06-17T03:16:39Z

@submission2019 , @ynahshan thanks for the reply, i am closing this issue. If i plan to make mobile net accuracy better will post the code and method here as well.

Regards
Amit

amitsrivastava78 · 2019-09-19T09:43:22Z

@submission2019 , thanks for the reply for the Mobilenet part, yes we are facing the same issue with mobilenetV2 of low accuracy. can you please describe in the measures you have take, for us the Top1 accuracy with 4-bit for mobilenet_v2 is coming to ~49%, can you please tell the exact steps for making it 70%.

Regards
Amit

limerainne · 2019-09-19T09:51:48Z

Dear @amitsrivastava78,

In the previous comment, I made a mistake (accidently set bitwidth to 8bit) in the test which brought incorrect higher accuracy.

Sorry for wrong information and deleting my comment without proper notice.

P.S. For avoiding confusion (the authors were refered in your comment), I'm not related to authors.

jonathanbonnard · 2019-11-07T15:48:51Z

Hi,
I have encountered the same problem in mobilenetv2 and I think I know where the problem is.
In fact, the program is quantizing the 3rd sub layer (aka linear bottleneck) but it should not. The output of this sub layer has to be kept at 4bit + 4bits + log2(nb_out_channels) else the dynamic range range will be clipped and this leads to wrong input values for the next 1x1 channel.
However, I don't know where the program should be modified to change this feature... Maybe authors can help?

ghost · 2020-02-08T22:55:38Z

Hi,
I want to save the quantized model and analysis its metrics such as inference time, model size, FLOPs, and Parameters, can anyone give me some advice? Or have you already finish it?
@amitsrivastava78 @submission2019 @limerainne @jonathanbonnard @ynahshan
Thanks a lot!

amitsrivastava78 closed this as completed Jun 17, 2019

amitsrivastava78 reopened this Sep 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Advantage of the 4-bit Quantization #4

Advantage of the 4-bit Quantization #4

amitsrivastava78 commented Jun 12, 2019

submission2019 commented Jun 12, 2019

amitsrivastava78 commented Jun 14, 2019

submission2019 commented Jun 16, 2019

amitsrivastava78 commented Jun 17, 2019

amitsrivastava78 commented Sep 19, 2019

limerainne commented Sep 19, 2019

jonathanbonnard commented Nov 7, 2019 •

edited

Loading

ghost commented Feb 8, 2020

Advantage of the 4-bit Quantization #4

Advantage of the 4-bit Quantization #4

Comments

amitsrivastava78 commented Jun 12, 2019

submission2019 commented Jun 12, 2019

amitsrivastava78 commented Jun 14, 2019

submission2019 commented Jun 16, 2019

amitsrivastava78 commented Jun 17, 2019

amitsrivastava78 commented Sep 19, 2019

limerainne commented Sep 19, 2019

jonathanbonnard commented Nov 7, 2019 • edited Loading

ghost commented Feb 8, 2020

jonathanbonnard commented Nov 7, 2019 •

edited

Loading