QuantMultiheadAttention: Use signed quantizer for attention weights? #755

iksnagreb · 2023-11-14T08:54:54Z

Currently, the attention weights are quantized using an unsigned quantizer by default, i.e., Uint8ActPerTensorFloat . I can see why this is the case, as the attention weights will never be negative anyway. However, when exporting to FINN, there is no support for unsigned quantizers right now, see here https://github.com/Xilinx/finn/blob/dev/src/finn/transformation/qonnx/qonnx_activation_handlers.py#L461

There should not be anything in general preventing us from just specifying a signed quantizer at initialization - the option is there and it seems to work. I would still like to suggest to set a signed quantizer (e.g., Int8ActPerTensorFloat) by default, to make it more FINN compatible out of the box.

For more context on the effort of streamlining the Brevitas exported QuantMultiheadAttention, please see Xilinx/finn#878

The text was updated successfully, but these errors were encountered:

iksnagreb · 2024-02-02T17:33:11Z

@Giuseppe5: As discussed, this seems to be more a FINN issue supporting this type of quantizer and it indeed makes more sense to have the unsigned quantizer following the softmax (cannot be negative). Even if it cannot be resolved over there, this should be documented at the FINN side, as brevitas aims for a more broader user base. Thus closing this.

iksnagreb closed this as completed Feb 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QuantMultiheadAttention: Use signed quantizer for attention weights? #755

QuantMultiheadAttention: Use signed quantizer for attention weights? #755

iksnagreb commented Nov 14, 2023

iksnagreb commented Feb 2, 2024

QuantMultiheadAttention: Use signed quantizer for attention weights? #755

QuantMultiheadAttention: Use signed quantizer for attention weights? #755

Comments

iksnagreb commented Nov 14, 2023

iksnagreb commented Feb 2, 2024