LCE on Raspberry Pi Zero (ARMv6) #620
-
I'm trying to find a way to run a neural network at a reasonable speed on an ARM1176JZF processor (i.e. the ARM processor used in the Raspberry Pi Zero). This is an ARMv6 32-bit... but the real kicker is that it does not have NEON. I've been able to cross-compile and run LCE successfully on my Pi Zero W by adding a "rpi0" config to .bazelrc with appropriate compiler options. Unfortunately, QuickNet inference time is about 8.5 seconds per frame, and QuickNetSmall takes around 5.1 seconds. It looks like it is using the "Portable C++" kernels (found in kernels.h), since NEON is not supported. Does anyone know if there are any options to create optimized kernels for the ARM1176JZF to improve the speed of LCE on this processor? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
We are definitely happy to accept PRs to add optimised kernels for ARMv6, however support for ARMv6 is currently not on our road map. You could also try changing the following line to |
Beta Was this translation helpful? Give feedback.
We are definitely happy to accept PRs to add optimised kernels for ARMv6, however support for ARMv6 is currently not on our road map.
You could also try changing the following line to
Register_BCONV_2D_OPT_INDIRECT_BGEMM
which might be faster on the Pi Zero due to a more optimal portable kernel implementation:compute-engine/larq_compute_engine/tflite/kernels/bconv2d.cc
Line 603 in 747b6b7