You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This dynamic quantization simply leverage the fact that
(1) the global max of the matrix P is not necessarily the max value of each line of matrix P
(2)you know the max and min value in the softmax computation of a line on the fly -- it's the inherit property of softmax, i.e. the numerators in every line is between [0,1]
, so you can leverage this fact without passing addtional global quantization information of matrix $P$.
No description provided.
The text was updated successfully, but these errors were encountered: