You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There should not be anything in general preventing us from just specifying a signed quantizer at initialization - the option is there and it seems to work. I would still like to suggest to set a signed quantizer (e.g., Int8ActPerTensorFloat) by default, to make it more FINN compatible out of the box.
For more context on the effort of streamlining the Brevitas exported QuantMultiheadAttention, please see Xilinx/finn#878
The text was updated successfully, but these errors were encountered:
@Giuseppe5: As discussed, this seems to be more a FINN issue supporting this type of quantizer and it indeed makes more sense to have the unsigned quantizer following the softmax (cannot be negative). Even if it cannot be resolved over there, this should be documented at the FINN side, as brevitas aims for a more broader user base. Thus closing this.
Currently, the attention weights are quantized using an unsigned quantizer by default, i.e.,
Uint8ActPerTensorFloat
. I can see why this is the case, as the attention weights will never be negative anyway. However, when exporting to FINN, there is no support for unsigned quantizers right now, see here https://github.com/Xilinx/finn/blob/dev/src/finn/transformation/qonnx/qonnx_activation_handlers.py#L461There should not be anything in general preventing us from just specifying a signed quantizer at initialization - the option is there and it seems to work. I would still like to suggest to set a signed quantizer (e.g.,
Int8ActPerTensorFloat
) by default, to make it more FINN compatible out of the box.For more context on the effort of streamlining the Brevitas exported
QuantMultiheadAttention
, please see Xilinx/finn#878The text was updated successfully, but these errors were encountered: