Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QuantMultiheadAttention: Use signed quantizer for attention weights? #755

Closed
iksnagreb opened this issue Nov 14, 2023 · 1 comment
Closed

Comments

@iksnagreb
Copy link

Currently, the attention weights are quantized using an unsigned quantizer by default, i.e., Uint8ActPerTensorFloat . I can see why this is the case, as the attention weights will never be negative anyway. However, when exporting to FINN, there is no support for unsigned quantizers right now, see here https://github.com/Xilinx/finn/blob/dev/src/finn/transformation/qonnx/qonnx_activation_handlers.py#L461

There should not be anything in general preventing us from just specifying a signed quantizer at initialization - the option is there and it seems to work. I would still like to suggest to set a signed quantizer (e.g., Int8ActPerTensorFloat) by default, to make it more FINN compatible out of the box.

For more context on the effort of streamlining the Brevitas exported QuantMultiheadAttention, please see Xilinx/finn#878

@iksnagreb
Copy link
Author

@Giuseppe5: As discussed, this seems to be more a FINN issue supporting this type of quantizer and it indeed makes more sense to have the unsigned quantizer following the softmax (cannot be negative). Even if it cannot be resolved over there, this should be documented at the FINN side, as brevitas aims for a more broader user base. Thus closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant