-
Notifications
You must be signed in to change notification settings - Fork 979
RISC-V 32-bit vector intrinsics kernels #3280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…dition and reduce to scalar only once per channel
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Signed |
Me and a group of fellow students have spent the last 12 months developing a set of optimized kernels for RISC-V edge devices running AI applications using 32-bit vector intrinsics as part of our senior undergraduate project.
We developed and tested the kernels using the Spike RISC-V simulator and benchmarked them by running modified versions of the person_detection and micro_speech examples and measuring the resulting cycle counts.
Our new kernels are built under a new build_target called
riscv_vector.Using our kernels, our results for one iteration of

person detectionyielded a 4.1x reduced cycle count compared to the default scalar implementations, and a 2.8x reduction compared to the compiler-vectorized default implementations:Our results for one iteration of

micro speech(including signal functions) yielded a 1.47x reduced cycle count compared to the default scalar implementations, and a 2.5x reduction compared to the compiler-vectorized default implementations:Interestingly we found the auto-vectorization done by the compiler in this instance to increase the cycle count rather than reduce it.
We were unsure if this is something that may be welcomed upstream but decided to make this pull request in case this could be useful for anyone else.