RISC-V 32-bit vector intrinsics kernels #3280

JaimeHW · 2026-01-06T05:20:28Z

Me and a group of fellow students have spent the last 12 months developing a set of optimized kernels for RISC-V edge devices running AI applications using 32-bit vector intrinsics as part of our senior undergraduate project.
We developed and tested the kernels using the Spike RISC-V simulator and benchmarked them by running modified versions of the person_detection and micro_speech examples and measuring the resulting cycle counts.
Our new kernels are built under a new build_target called riscv_vector.

Using our kernels, our results for one iteration of person detection yielded a 4.1x reduced cycle count compared to the default scalar implementations, and a 2.8x reduction compared to the compiler-vectorized default implementations:

Our results for one iteration of micro speech (including signal functions) yielded a 1.47x reduced cycle count compared to the default scalar implementations, and a 2.5x reduction compared to the compiler-vectorized default implementations:

Interestingly we found the auto-vectorization done by the compiler in this instance to increase the cycle count rather than reduce it.

We were unsure if this is something that may be welcomed upstream but decided to make this pull request in case this could be useful for anyone else.

…tations

… tflite-micro

…nvolution tests

…micro

typo

…dition and reduce to scalar only once per channel

…tion logic

…ather offsets

…ic decomposition

google-cla · 2026-01-06T05:20:34Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

JaimeHW · 2026-01-06T05:25:57Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Signed
@pseudonam-gc
@numbers1234567

JaimeHW and others added 30 commits April 14, 2025 12:19

Create directory to hold optimized RISC-V vector instrinsics implemen…

ffffc7d

…tations

Add placeholder file until we adapt the convolution implementation to…

54e272a

… tflite-micro

Annotations

863d5df

Offset() and MultiplyByQuantizedMultiplier() references

ca71870

Add vector intrinsiscs convolution implementation

3b24e0c

Add empty makefile for building the vector intrinsics implementations

1632652

Build and test our 2d convolution implementation with tflite micro

a98604f

Fix formatting

e173412

Fix padding logic

d980b28

Build tflite with custom kernel

a749783

Partially vectorized 2D convolution implementation that passes all co…

3d6bbc1

…nvolution tests

Remove duplicate makefile

0601584

Restore TFLM primary Makefile

c4c873b

Use vwmacc for convolution channel accumulation

e51f7c1

Add comments

228ae90

Add MicroPrintf output to ConvPerChannelRVV

4d7257f

Fix typo

93cf8b2

MicroPrintf in base implementation

8f4a7a4

Fix formatting

5dc8182

Merge branch 'main' of https://github.com/Peanut-Microsystems/tflite-…

b09116c

…micro

Information on building w/ custom implementations

d2ae070

Update PEANUT-README.md

0f5b187

typo

Vectorize out_x dimension

06dc715

DepthwiseConvPerChannel

1d20d31

Remove old includes

8ca51a2

Perform 64bit operations with 32bit vector intrinsics

63700c5

Add comments

a8a51b0

Change C-style casts to static_cast

0a88c11

FullConnectedRVV and FullyConnectedPerChannelRVV

a09fc3e

Delete unused file copy

5875dfa

JaimeHW added 27 commits November 18, 2025 09:14

Update micro_speech_test2

4255b94

FilterBank: Accumulate carries into a vector register using masked ad…

065051d

…dition and reduce to scalar only once per channel

Optimize RFFT by replacing gather/scatter with strided loads

3affc81

SoftMax: Fix vector-vector merge intrinsic usage and fix 64-bit emula…

6ae426d

…tion logic

Vector optimized FilterBankLog kernel

bbe6a3e

Optimize FilterbankLogRVV with branchless normalization and fix LUT g…

2190e8a

…ather offsets

Remove redundant included headers

007ce6e

Optimize FilterbankLogRVV kernel using widening multiply and arithmet…

0633133

…ic decomposition

FilterbankLogRVV: Use widening instructions and fix signed scaling logic

06dd4c8

Update FilterBankLogRVV

e3298e3

Update FilterBankLogRVV

0a69b8d

Fix register spilling in FilterBank

41bdbd8

Optimize register usage for convolution and fullyconnected kernels

7592a96

Softmax: Optimize register usage

c79cb32

RFFT: Switch to LMUL=2 to reduce register pressure

e694a1d

Cleanup

ca9555f

Merge branch 'tensorflow:main' into main

5c90df1

Cleanup headers

03bb47e

Merge remote-tracking branch 'refs/remotes/origin/main'

af5cd5a

Update .gitignore

5236854

Remove PEANUT-README.md

e6338a2

Restore person detections main.cc

0be2e65

Add new line EOF

6d71d64

Restore end-of-file new lines

a73a8ec

Remove new line at end of micro speech Makefile

0cfa2f2

Restore

b5820fa

Remove

b079dca

JaimeHW requested a review from a team as a code owner January 6, 2026 05:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RISC-V 32-bit vector intrinsics kernels #3280

RISC-V 32-bit vector intrinsics kernels #3280

Uh oh!

JaimeHW commented Jan 6, 2026 •

edited

Loading

Uh oh!

google-cla bot commented Jan 6, 2026

Uh oh!

JaimeHW commented Jan 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

RISC-V 32-bit vector intrinsics kernels #3280

Are you sure you want to change the base?

RISC-V 32-bit vector intrinsics kernels #3280

Uh oh!

Conversation

JaimeHW commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

google-cla bot commented Jan 6, 2026

Uh oh!

JaimeHW commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JaimeHW commented Jan 6, 2026 •

edited

Loading

JaimeHW commented Jan 6, 2026 •

edited

Loading