LCE on Raspberry Pi Zero (ARMv6) #620

nathankopp · 2021-03-17T02:42:53Z

nathankopp
Mar 17, 2021

I'm trying to find a way to run a neural network at a reasonable speed on an ARM1176JZF processor (i.e. the ARM processor used in the Raspberry Pi Zero). This is an ARMv6 32-bit... but the real kicker is that it does not have NEON.

I've been able to cross-compile and run LCE successfully on my Pi Zero W by adding a "rpi0" config to .bazelrc with appropriate compiler options.

Unfortunately, QuickNet inference time is about 8.5 seconds per frame, and QuickNetSmall takes around 5.1 seconds. It looks like it is using the "Portable C++" kernels (found in kernels.h), since NEON is not supported.

Does anyone know if there are any options to create optimized kernels for the ARM1176JZF to improve the speed of LCE on this processor?

Answered by lgeiger

Mar 17, 2021

Does anyone know if there are any options to create optimized kernels for the ARM1176JZF to improve the speed of LCE on this processor?

We are definitely happy to accept PRs to add optimised kernels for ARMv6, however support for ARMv6 is currently not on our road map.

You could also try changing the following line to Register_BCONV_2D_OPT_INDIRECT_BGEMM which might be faster on the Pi Zero due to a more optimal portable kernel implementation:

compute-engine/larq_compute_engine/tflite/kernels/bconv2d.cc

Line 603 in 747b6b7

return Register_BCONV_2D_OPT_BGEMM();

View full answer

lgeiger · 2021-03-17T10:42:11Z

lgeiger
Mar 17, 2021
Maintainer

Does anyone know if there are any options to create optimized kernels for the ARM1176JZF to improve the speed of LCE on this processor?

We are definitely happy to accept PRs to add optimised kernels for ARMv6, however support for ARMv6 is currently not on our road map.

You could also try changing the following line to Register_BCONV_2D_OPT_INDIRECT_BGEMM which might be faster on the Pi Zero due to a more optimal portable kernel implementation:

compute-engine/larq_compute_engine/tflite/kernels/bconv2d.cc

Line 603 in 747b6b7

return Register_BCONV_2D_OPT_BGEMM();

1 reply

nathankopp Mar 17, 2021
Author

Thanks! That actually improved the speed to around 5 seconds for Quicknet and 3 seconds for QuicknetSmall.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LCE on Raspberry Pi Zero (ARMv6) #620

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

LCE on Raspberry Pi Zero (ARMv6) #620

nathankopp Mar 17, 2021

Replies: 1 comment · 1 reply

lgeiger Mar 17, 2021 Maintainer

nathankopp Mar 17, 2021 Author

nathankopp
Mar 17, 2021

Replies: 1 comment 1 reply

lgeiger
Mar 17, 2021
Maintainer

nathankopp Mar 17, 2021
Author