-
Notifications
You must be signed in to change notification settings - Fork 501
Add LayerNorm support for Vivado #1110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
First in line on my PR review TODO list, I expect early next week to have time for this. |
Thank you very much ! |
|
pre-commit.ci autofix |
|
@vloncar I hope this goes in the direction of what you had in mind for the performance validation. I ran synthesis in Vitis 2023.1, 2024.1, and 2025.1 for different input sizes to the LayerNorm and plotted FFs, LUTs, DSPs, BRAM, latency, and II as a function of that input size. 2024.1 and 2025.1 are basically identical, whereas 2023.1 uses a bit less resources but has worse latency. This is the the default ap_fixed<16.6> and the default target part.
|
|
Thanks, looks good. Do all reports say the timing is met? (No scheduling warnings etc, the clock uncertainty is met etc) |
| static const unsigned dim = CONFIG_T::n_in / CONFIG_T::seq_len; | ||
| data_T in_val[dim]; | ||
| res_T outval[dim]; | ||
| // Use a function_instantiate in case it helps to explicitly optimize unchanging weights/biases |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not. I think this can be removed from new code.
test/pytest/test_layernorm.py
Outdated
| hls_model = hls4ml.converters.convert_from_keras_model( | ||
| custom_epsilon_model, backend=backend, hls_config=custom_config, io_type='io_parallel', output_dir=output_dir | ||
| ) | ||
| hls_model.compile() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test would be faster if we used hls_model.write(), or completely skip the step of writing/linking. We don't use it here. The later accuracy test checks if the produced code is compilable
test/pytest/test_layernorm.py
Outdated
| # Predict | ||
| y_keras = model.predict(data).flatten() | ||
| y_hls = hls_model.predict(data).flatten() | ||
| np.testing.assert_allclose(y_keras, y_hls, rtol=0, atol=atol, verbose=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is atol a global variable?
hls4ml/converters/pytorch/core.py
Outdated
|
|
||
| if not ((len(input_shapes[0])) == 3): | ||
| raise Exception( | ||
| 'input size is not currently supported by hls4ml; ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to say Input shape <some shape> is not supported, only ...
| node.get_output_variable().dim_names = dim_names | ||
| elif ( | ||
| isinstance(node, LayerNormalization) | ||
| and not model.config.config['HLSConfig']['Model']['ChannelsLastConversion'] == "off" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the proposed change in #1352 we'll never get to off check here.
* paser_mht * change parser and modify keras_to_hls * IR_mutihead_attention * IR done * create mha file in template * mha .h file dummy algo * config of mha * update mha config * dummy mha * add transpose into mha * projection_of_qkv_in_mha * mha_first_draft * able to predict model correct * delete some unnassary comments * delete comments * resource strategy of transformer * change sm lagacy * update MHA, optimized * support resource * update * dense_muti_dim_support * parallel execute dense * updates * add_layerNorm_support * MHA updated * LayerNorm_bug_fix * update bit precision * config update * add some comment * run pre-commit * Added support on QMultiHeadAttention, QLayerNormalization, and quantized_softmax * updated on hls4ml transformer * trying to clean the diff * trying to clean the diff * trying to clean the diff * trying to clean the diff * trying to clean the diff * undo vhdl -> verilog change * halfway working layernorm + test * layernorm is now pretty functional * layernorm on pytorch also * minor cleanup * more cleanup, pre-commit * test for mha which kinda works maybe if you squint * multihead attention working on keras and pytorch * fiddly precision / accuracy changes for layernorm * fix lookup table and label loops * remove dense_seq * undo qkeras changes * fix merge conflict residue * remove non-layernorm changes * change to uniform LUT and fix precision * [pre-commit.ci] auto fixes from pre-commit hooks * fix encodings issue with dos2unix * add Vitis as another tested backend * Address PR feedback * [pre-commit.ci] auto fixes from pre-commit hooks * fix too-long lines * fix merge issue * trigger pre-commit * re-add missing math import * [pre-commit.ci] auto fixes from pre-commit hooks * addressing Vladimir's latest comments * change also pytorch test for layernorm and revert change to build command * sideport changes to channels-last converter from 1352 --------- Co-authored-by: Ethan <[email protected]> Co-authored-by: Jan-Frederik Schulte <[email protected]> Co-authored-by: LostEcho365 <[email protected]> Co-authored-by: Rian Flynn <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>







Description
This PR adds support for Layer Normalization using either Keras or PyTorch with the Vivado backend in
io_parallelmode.This implementation uses a lookup table for inverse square root; the inputs to the lookup table follow a logarithmic distribution for better accuracy.
Tests have been added for both Keras and Pytorch parsing.
Credit is due to @Ethan0Jiang and @LostEcho365 (Zhixing Jiang and Dennis Yin) for their Vivado implementation and Keras parsing support; my contributions were making a change to the inverse square root lookup table implementation, implementing PyTorch support, and adding unit tests. (Here's a link to their pre-print.) The original code authors have given permission for their code to be merged into hls4ml.
Linked issue: #1109
Type of change
Tests
Two unit tests added:
test/pytest/test_layernorm.pyandtest/pytest/test_layernorm_pytorch.pyChecklist
pre-commiton the files I edited or added.