Vortex uses the SIMT (Single Instruction, Multiple Threads) execution model with a single warp issued per cycle.
- Threads
- Smallest unit of computation
- Each thread has its own register file (32 int + 32 fp registers)
- Threads execute in parallel
- Warps
- A logical clster of threads
- Each thread in a warp execute the same instruction
- The PC is shared; maintain thread mask for Writeback
- Warp's execution is time-multiplexed at log steps
- Ex. warp 0 executes at cycle 0, warp 1 executes at cycle 1
- Thread Mask Control
- Control the number of warps to activate during execution
TMC
count: activate count threads
- Warp Scheduling
- Control the number of warps to activate during execution
WSPAWN
count, addr: activate count warps and jump to addr location
- Control-Flow Divergence
- Control threads activation when a branch diverges
SPLIT
taken, predicate: apply predicate thread mask and save current state into IPDOM stackJOIN
: pop IPDOM stack to restore thread maskPRED
predicate, restore_mask: thread predicate instruction
- Control threads activation when a branch diverges
- Warp Synchronization
BAR
id, count: stall warps entering barrier id until count is reached
Vortex has a 6-stage pipeline:
-
Schedule
- Warp Scheduler
- Schedule the next PC into the pipeline
- Track stalled, active warps
- IPDOM Stack
- Save split/join states for divergent threads
- Inflight Tracker
- Track in-flight instructions
- Warp Scheduler
-
Fetch
- Retrieve instructions from memory
- Handle I-cache requests/responses
-
Decode
- Decode fetched instructions
- Notify warp scheduler on control instructions
-
Issue
- IBuffer
- Store decoded instructions in separate per-warp queues
- Scoreboard
- Track in-use registers
- Check register use for decoded instructions
- Operands Collector
- Fetch the operands for issued instructions from the register file
- IBuffer
-
Execute
- ALU Unit
- Handle arithmetic and branch operations
- FPU Unit
- Handle floating-point operations
- LSU Unit
- Handle load/store operations
- SFU Unit
- Handle warp control operations
- Handle Control Status Registers (CSRs) operations
- ALU Unit
-
Commit
- Write result back to the register file and update the Scoreboard.
- Sockets
- Grouping multiple cores sharing L1 cache
- Clusters
- Grouping of sockets sharing L2 cache
More details about the cache subsystem are provided here.