Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MobileNetV1 - "size of variable 'srl' is too large to handle" from Q_srl.v at step "step_out_of_context_synthesis" #1260

Open
3 tasks done
kalahel opened this issue Jan 10, 2025 · 4 comments
Labels
bug Something isn't working

Comments

@kalahel
Copy link

kalahel commented Jan 10, 2025

Prerequisites

Please make sure to check off these prerequisites before submitting a bug report.

  • The bug appears on the current main-branch.
  • Check that the issue hasn't already been reported, by checking the currently open issues.
  • If there are steps to reproduce the problem, make sure to write them down below.
  • If relevant, please include the ONNX files, which were created directly before and/or after the bug.

Quick summary

I'm currently working on a split of the MobileNetV1 from finn example. I split it for its first part to fit in a Zynq7020 (onnx below). The problem is that I get "size of variable 'srl' is too large to handle" at step "step_out_of_context_synthesis".

Configuration

  • Vivado 2022.2 linux
  • Finn running on ubuntu on a WSL
  • Targeting a Zynq 7020

Onnx model

https://file.io/DivU9fhs8KuU

Details and steps

To get to the model above, I ran the steps from the finn examples scripts (same way as the readme) :

step_mobilenet_streamline,
step_mobilenet_lower_convs,
step_mobilenet_convert_to_hw_layers_separate_th,
"step_create_dataflow_partition",
"step_specialize_layers",
"step_apply_folding_config",
"step_minimize_bit_width",

I then split the model to fit on a Zynq7020 (not a Pynq but similar).

Then ran it to a script containing only those build steps :

    "step_generate_estimate_reports",
    "step_hw_codegen",
    "step_hw_ipgen",
    "step_set_fifo_depths",
    "step_create_stitched_ip",
    "step_measure_rtlsim_performance",
    "step_out_of_context_synthesis",
    "step_synthesize_bitfile",

Everything runs fine until step_out_of_context_synthesis, where the script seems to stop, I investigated the generated vivado project for this step and go this report :

vivado.log

In short the error is :
size of variable 'srl' is too large to handle; the size of the variable is 1204216, the limit is 1000000 [/synth_out_of_context_6dkwtifv/results_finn_design_wrapper/Q_srl.v:100]

I think it caused by the input size of the model : 224x224x3 x 8 bits = 1 204 224.

I tried fixing it manually with a tcl command set_param synth.elaboration.rodinMoreOptions "rt::set_parameter var_size_limit 4194304" (source) seemingly fixing it and now I get the error :
ERROR: [Synth 8-403] loop limit (65536) exceeded [/synth_out_of_context_moj0vvpc/results_finn_design_wrapper/Q_srl.v:179].

How can I fix this behavior ? I assume that if the example is working on the alveo, vivado should be able to synthesize loop and array this big.

Thanks you for your time,
Mat

@kalahel kalahel added the bug Something isn't working label Jan 10, 2025
@fpjentzsch
Copy link
Collaborator

Hi,
seems like you have a very deep FIFO in your accelerator. Maybe you can optimize your folding or FIFO sizing to avoid this in the first place. It should not be necessary to buffer the whole input frame for a MobileNetV1.

Otherwise you could try the builder option split_large_fifos.

@sn0wst0rm
Copy link

@fpjentzsch
Hello,
I had a related problem during set fifo depth with auto fifos to search for the best fit. I have a StreamingMaxPool that process an input image of 416x416x8, and the estimated cycles are ~215000. When performing the simulation to analyze the fifos, the build failed because the period of the simulation was set lower than the actual time it takes for the layer to simulate, which was said to be at least the double (over 500000 cycles).
To proceed I had to manually patch the minimum period, and I think this is something known because I found in your code for the StreamingMaxPool custom op clock cycle estimation method a todo saying the calculation was wrong.

Now, I get that it could be a bit wrong, but in this case was halving the performance.

I noticed that the StreamingMaxPool_Precision implemented in hls first process all the image, pixel by pixel, and then outputs it, not supporting any SIMD (because you are already proceeding all the channels in parallel) nor PE.

I found an old PR, #789 made by you that was talking about also parallelizing the pixels (mmv). Did you ever accomplish this in any way? The code in that PR is pretty old and I don’t even think it can merge, but it would be nice to have such option since for the first layer (typically bigger, like for my yolo) we are basically limited by this slow computation.

If you have any tip or advice on how to improve the performance of this layer, they would be really welcome.

Thank you!

@kalahel
Copy link
Author

kalahel commented Jan 13, 2025

Hi,

Thanks for your responses.
@fpjentzsch setting auto_fifo_depths=False did solve the issue, thank you a lot !

I can close the thread now or leave it open if you want to answer to @sn0wst0rm.

Best regards,
Mat

@sn0wst0rm
Copy link

 @kalahel if you don't mind I'd wait a bit more for an answer from @fpjentzsch.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants