Proof-of-concept draft: Add a simple vector extension to femtorv #53
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Here is a draft of my ideas that I came up with yesterday.
Caveat emptor
Take it for what it is: A sketchbook proof of concept experiment, totally untested. I did not even try to build it, and I am pretty sure that some state transitions will not work properly for certain vector operations.
Functionality
As described in the code comment, this change tries to:
VSETVL
instruction (from the V extension) to manipulate the vector length (VL) register.EXECUTE
orEXECUTE+WAIT_ALU_OR_MEM
states, until VL vector elements have been processed.Instruction encoding
To encode vector instructions, the two least significant bits of the instruction word are used (in RV32I these bits are always
11
, so anything else indicates a vector operation). This is not compatible with the C extension, for instance, so some other encoding trick must be used if you want to support that (I am not very versed in RISC-V instruction encoding, but theCUSTOM_0 - CUSTOM_3
pages could be a possibility).Bugs / refactoring
I think that the source register lookup and destination register index (
rdId
) is broken for multi-cycle instructions (load/store/div). SpecificallyvecIdx
is not always updated in the right state/cycle.Furthermore the source register lookup is currently done in two different places (really it needs to be done in three different places IIUIC). It feels like this part can be refactored to solve both the out-of-sync
vecIdx
problem and possibly reduce LUT usage.Possible improvements
More vector registers
The current implementation only provides eight vector registers, of which 3-5 are usable in practice (V0 can never be used, and some scalar registers must be spared for scalar operations). It would be very simple, and valuable, to add more vector registers. All that is required is to double (or quadruple?) the number of scalar registers in
registerFile
. It is mostly a matter of balancing the size of the core (e.g. the number of LUT:s).Stride based load/store
Another functionality that I have not added, but that is quite powerful, is support for on-the-fly generation of address strides. I think that a feasible solution would be to add special handling of the case when
src2IsVec = 1
and src2 is an immediate value (e.g.isLoad | isStore | isALUimm
), such that the immediate value is replaced by an incrementing (registered) value as follows[0, IMM, 2*IMM, 3*IMM, ...]
.Writing programs
This is of course a major problem at the moment. No compiler / toolchain supports the new vector instructions (except for VSETVL).
For prototyping purposes I would personally only write vectorized code directly in assembler language (that also gives better control over scalar register allocation), by first compiling the corresponding scalar code, and then hand-modifying the generated machine code to use the encoding for vector instructions (i.e. modify the 2 LSB:s), and emitting them as
.word
directives.For instance, the following C code:
...could be implemented in RISC-V assembler with vector instructions (assuming that stride based load/store is supported):
As it's far from convenient, in the long run you probably want to patch some toolchain (e.g. binutils/as) to support these instructions to some degree.