Proof-of-concept draft: Add a simple vector extension to femtorv #53

mbitsnbites · 2022-01-10T13:03:15Z

Here is a draft of my ideas that I came up with yesterday.

Caveat emptor

Take it for what it is: A sketchbook proof of concept experiment, totally untested. I did not even try to build it, and I am pretty sure that some state transitions will not work properly for certain vector operations.

Functionality

As described in the code comment, this change tries to:

Map vector registers on top of the scalar register file.
Adds the VSETVL instruction (from the V extension) to manipulate the vector length (VL) register.
Adds logic for iterating over the vector register elements while staying in the EXECUTE or EXECUTE+WAIT_ALU_OR_MEM states, until VL vector elements have been processed.

Instruction encoding

To encode vector instructions, the two least significant bits of the instruction word are used (in RV32I these bits are always 11, so anything else indicates a vector operation). This is not compatible with the C extension, for instance, so some other encoding trick must be used if you want to support that (I am not very versed in RISC-V instruction encoding, but the CUSTOM_0 - CUSTOM_3 pages could be a possibility).

Bugs / refactoring

I think that the source register lookup and destination register index (rdId) is broken for multi-cycle instructions (load/store/div). Specifically vecIdx is not always updated in the right state/cycle.

Furthermore the source register lookup is currently done in two different places (really it needs to be done in three different places IIUIC). It feels like this part can be refactored to solve both the out-of-sync vecIdx problem and possibly reduce LUT usage.

Possible improvements

More vector registers

The current implementation only provides eight vector registers, of which 3-5 are usable in practice (V0 can never be used, and some scalar registers must be spared for scalar operations). It would be very simple, and valuable, to add more vector registers. All that is required is to double (or quadruple?) the number of scalar registers in registerFile. It is mostly a matter of balancing the size of the core (e.g. the number of LUT:s).

Stride based load/store

Another functionality that I have not added, but that is quite powerful, is support for on-the-fly generation of address strides. I think that a feasible solution would be to add special handling of the case when src2IsVec = 1 and src2 is an immediate value (e.g. isLoad | isStore | isALUimm), such that the immediate value is replaced by an incrementing (registered) value as follows [0, IMM, 2*IMM, 3*IMM, ...].

Writing programs

This is of course a major problem at the moment. No compiler / toolchain supports the new vector instructions (except for VSETVL).

For prototyping purposes I would personally only write vectorized code directly in assembler language (that also gives better control over scalar register allocation), by first compiling the corresponding scalar code, and then hand-modifying the generated machine code to use the encoding for vector instructions (i.e. modify the 2 LSB:s), and emitting them as .word directives.

For instance, the following C code:

void foo(int* dst, const int* src, int num) {
    for (int i = 0; i < num; ++i) {
        dst[i] = src[i];
    }
}

...could be implemented in RISC-V assembler with vector instructions (assuming that stride based load/store is supported):

foo:
	blez	a2, .L2
.L1:
	vsetvl	a4, a2, zero
	.word	0x0045a381	# lw	v7, +4(a1)
	.word	0x00752221	# sw	v7, +4(a0)
	sub	a2, a2, a4
	addi	a0, a0, 16
	addi	a1, a1, 16
	bnez	a2, .L1
.L2:
	ret

As it's far from convenient, in the long run you probably want to patch some toolchain (e.g. binutils/as) to support these instructions to some degree.

This is a proof of concept implementation that maps virtual vector registers onto the scalar register file.

Implement VSETVL and vector op state transitions. Also make the vector length build-time configurable.

mbitsnbites added 2 commits January 9, 2022 22:31

femtorv32-electron: Add vector support

4068fde

This is a proof of concept implementation that maps virtual vector registers onto the scalar register file.

fixup! femtorv32-electron: Add vector support

f2f6125

Implement VSETVL and vector op state transitions. Also make the vector length build-time configurable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proof-of-concept draft: Add a simple vector extension to femtorv #53

Proof-of-concept draft: Add a simple vector extension to femtorv #53

mbitsnbites commented Jan 10, 2022 •

edited

Loading

Proof-of-concept draft: Add a simple vector extension to femtorv #53

Are you sure you want to change the base?

Proof-of-concept draft: Add a simple vector extension to femtorv #53

Conversation

mbitsnbites commented Jan 10, 2022 • edited Loading

Caveat emptor

Functionality

Instruction encoding

Bugs / refactoring

Possible improvements

More vector registers

Stride based load/store

Writing programs

mbitsnbites commented Jan 10, 2022 •

edited

Loading