Skip to content

Commit

Permalink
Update CHANGELOG.md and README.md for 2022.1.0 (#154)
Browse files Browse the repository at this point in the history
  • Loading branch information
jzuckerman committed Mar 8, 2022
1 parent 3034e55 commit a8863f8
Show file tree
Hide file tree
Showing 2 changed files with 184 additions and 125 deletions.
303 changes: 181 additions & 122 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,65 @@ Changelog](https://keepachangelog.com/en/1.0.0/), and this project
adheres to [Calendar Versioning](https://calver.org/) with format
`YYYY.MINOR.MICRO`.

## [2022.1.0]

### Added

- **Architecture**
- Ariane SMP: enabled by adding ACE bus for L1 invalidations and modifying the ESP cache hierarchy (#146)
- Hardware monitoring system: new implementation to enable easy access from software (#140)
- Coherence modes for third-party accelerators: non-coherent DMA, llc-coherent-DMA, coherent-DMA

- **Accelerators**
- Stratus HLS flow
- _FFT2_: improves upon the _FFT_ accelerator with support for batching, FFT sizes larger than accelerator private local memory, and inverse FFT

- **Accelerator design flows**
- RTL accelerator design flow (#123)

- **Software**
- Python3 support (#124)
- Monitors API for performance evaluation (#140)
- OpenSBI support for Ariane-based SoCs (#146)
- Creation of a baremetal test library for baremetal applications not tied to an ESP accelerator
- NVDLA with coherence selection
- Monitors API
- SLM tile test

### Improved

- **Architecture**
- Move NoC and JTAG to the top level of the tile (#122)
- Reset of asynchronous FIFOs

- **Accelerators**
- Stratus HLS flow
- _FFT_: add batching
- _Nightvision_: handle larger image sizes (#130)
- Increase number of accelerator configuration registers from 14 to 48
- Ensure accelerator reset is synchronous by adding register to DMA FSM

- **Infrastructure**
- Use local paths for toolchain installation (#119)
- Standardize selection of the number of LLC sets across cache implementations

- **Software**
- Upgrade _riscv-pk_ and update baremetal probe library (#120)

### Fixed
- **Architecture**
- Overflow issue in _axislv2noc_
- CPU DMA to SLM tile
- Proxies for ASIC memory link
- Various latches and incomplete sensitivity lists

- **Infrastructure**
- Xcelium compilation

- **Software**
- RCU stall issue during Linux boot on Ariane mitigated with new kernel configuration
- Various accelerator applications

## [2021.2.0]

### Added
Expand Down Expand Up @@ -83,135 +142,135 @@ adheres to [Calendar Versioning](https://calver.org/) with format
### Added

- **Accelerator design flows**
- Keras/Pytorch/ONNX with [hls4ml](https://fastmachinelearning.org/hls4ml/)
- Accelerator templates
- Accelerator and test applications generation with AccGen
- [Tutorial](https://www.esp.cs.columbia.edu/docs/hls4ml/)
- C/C++ with Xilinx Vivado HLS
- Accelerator templates
- Accelerator and test applications skeleton generation with AccGen
- [Tutorial](https://www.esp.cs.columbia.edu/docs/cpp_acc/)
- Sample accelerators: adder (element-wise addition)
- C/C++ with Mentor Catapult HLS
- [Tutorial](https://www.esp.cs.columbia.edu/docs/mentor_cpp_acc/)
- SystemC with Cadence Stratus HLS
- Accelerator templates ([includes](https://github.com/sld-columbia/esp-accelerator-templates), skeleton templates)
- Accelerator and test applications skeleton generation with AccGen
- [Tutorial](https://www.esp.cs.columbia.edu/docs/systemc_acc/)
- Sample accelerators: dummy (identity mapping), fft (Fast Fourier Transform 1D), sort, spmv (sparse matrix-vector multiplication), synth (synthetic traffic generator), nightvision (night-vision kernels), vitbfly2 (Viterbi butterfly), vitdodec (Viterbi decoder)
- Chisel
- [Accelerator templates](https://github.com/sld-columbia/esp-chisel-accelerators)
- Sample accelerators: adder (element-wise addition), counter, fft (Fast Fourier Transform 1D)
- Keras/Pytorch/ONNX with [hls4ml](https://fastmachinelearning.org/hls4ml/)
- Accelerator templates
- Accelerator and test applications generation with AccGen
- [Tutorial](https://www.esp.cs.columbia.edu/docs/hls4ml/)
- C/C++ with Xilinx Vivado HLS
- Accelerator templates
- Accelerator and test applications skeleton generation with AccGen
- [Tutorial](https://www.esp.cs.columbia.edu/docs/cpp_acc/)
- Sample accelerators: adder (element-wise addition)
- C/C++ with Mentor Catapult HLS
- [Tutorial](https://www.esp.cs.columbia.edu/docs/mentor_cpp_acc/)
- SystemC with Cadence Stratus HLS
- Accelerator templates ([includes](https://github.com/sld-columbia/esp-accelerator-templates), skeleton templates)
- Accelerator and test applications skeleton generation with AccGen
- [Tutorial](https://www.esp.cs.columbia.edu/docs/systemc_acc/)
- Sample accelerators: dummy (identity mapping), fft (Fast Fourier Transform 1D), sort, spmv (sparse matrix-vector multiplication), synth (synthetic traffic generator), nightvision (night-vision kernels), vitbfly2 (Viterbi butterfly), vitdodec (Viterbi decoder)
- Chisel
- [Accelerator templates](https://github.com/sld-columbia/esp-chisel-accelerators)
- Sample accelerators: adder (element-wise addition), counter, fft (Fast Fourier Transform 1D)
- **Third-party accelerator integration flow**
- Supported accelerator interfaces: AXI for the memory interface, AXI-Lite and APB for the configuration interface
- [Tutorial](https://www.esp.cs.columbia.edu/docs/thirdparty_acc/)
- Sample accelerators: [NVDLA](http://nvdla.org/)
- Supported accelerator interfaces: AXI for the memory interface, AXI-Lite and APB for the configuration interface
- [Tutorial](https://www.esp.cs.columbia.edu/docs/thirdparty_acc/)
- Sample accelerators: [NVDLA](http://nvdla.org/)
- **SoC design flow**
- High-level SoC configuration (batch or GUI)
- Automatic SoC generation
- Push-button full-system RTL simulation of bare-metal programs
- Supported simulators: Mentor Modelsim SE, Cadence Incisive, Cadence Xcelium
- Push-button FPGA bitstream generation
- Supported FPGA tools: Xilinx Vivado
- High-level SoC configuration (batch or GUI)
- Automatic SoC generation
- Push-button full-system RTL simulation of bare-metal programs
- Supported simulators: Mentor Modelsim SE, Cadence Incisive, Cadence Xcelium
- Push-button FPGA bitstream generation
- Supported FPGA tools: Xilinx Vivado
- **Architecture**
- NoC
- Packet-switched NoC with lookahead routing, single-cycle hop, and configurable bitwidth
- ESP SoCs use 6 bidirectional physical NoC planes
- 3 for cache coherence messages (32-bits or 64-bits based on processor architecture)
- 2 for DMA messages (32-bits or 64-bits based on processor architecture)
- 1 32-bit plane for the other messages (interrupts, memory-mapped IO and configuration registers)
- Processor tile
- Processor
- Available options: 32-bit [Leon3](https://www.gaisler.com/index.php/products/processors/leon3) (Sparc v8) with ESP FPU, 64-bit [Ariane](https://github.com/openhwgroup/cva6) (RISC-V), 32-bit [Ibex](https://github.com/lowRISC/ibex) (RISC-V)
- L2 private cache (optional)
- NoC-based directory-based MESI protocol
- Available implementations: [SystemVerilog](https://github.com/sld-columbia/esp-caches/tree/master/l2), [SystemC](https://github.com/sld-columbia/esp-caches/tree/master/systemc/l2)
- Bus
- Memory request bus options: AXI, AHB
- Memory-mapped IO requests bus options: APB
- Support for SoCs with multiple processor tiles
- Accelerator tile
- Accelerator (see accelerator design flow options above)
- Accelerator socket
- Accelerator configuration registers (default registers + user-defined registers)
- Miss-free accelerator TLB for low-overhead virtual memory support
- Accelerator DMA engine
- Private cache (optional)
- Same as the L2 private cache in the processor tile
- Cache coherence
- Supported options: coherent with private cache, coherent DMA, LLC-coherent DMA, non-coherent DMA
- Configurable at run-time
- Point-to-point accelerator communication
- Configurable at run-time
- Support for SoCs with multiple accelerator tiles
- Third-party accelerator tile
- Accelerator socket
- Bus-to-NoC bridges
- Memory requests bus options: AXI
- Memory-mapped IO requests bus options: AXI-Lite, APB
- Support for SoCs with multiple third-party accelerator tiles
- Memory tile
- Last-level cache (LLC) slice (optional)
- NoC-based directory-based MESI protocol
- Support for coherent DMA and LLC-coherent DMA
- Available implementations: [SystemVerilog](https://github.com/sld-columbia/esp-caches/tree/master/llc), [SystemC](https://github.com/sld-columbia/esp-caches/tree/master/systemc/llc)
- Memory channel
- Optionally include AHB bus and memory controller in the memory tile
- Memory simulation model for full-system RTL simulation
- Support for all accelerator cache-coherence options
- Support for SoCs with multiple memory tiles
- Up to 2 memory tiles on proFPGA Virtex7 XC7V2000T FPGA module and up to 4 memory tiles on proFPGA Virtex UltraScale XCVU440 FPGA module
- Auxiliary tile
- Peripherals: Ethernet, UART, DVI (only on proFPGA FPGA modules with DVI interface board)
- ESP Link debug unit
- SoC initialization unit
- Interrupt controller: Leon3 multiprocessor interrupt controller or RISC-V platform interrupt controller
- Timer: GRLIB general-purpose timer or [RISC-V core-local interrupt controller](https://github.com/sld-columbia/ariane/tree/master/src/clint)
- Frame buffer
- Scratchpad (shared-local memory) tile
- Shared software-managed addressable memory
- Support for multiple SLM tiles
- SLM can replace external memory when configuring ESP with no memory tiles and selecting the Ibex core
- Additional SoC services
- ESP tile CSRs: memory mapped and accessible from software
- Configuration registers: PADs configuration, clock generators configuration, tile ID configuration, core ID configuration (processor tile only), Ethernet and UART scalers configuration (auxiliary tile only), soft reset
- Performance counters: accelerators activity, caches hit and miss rates, memory accesses, NoC routers traffic, dynamic voltage-frequency scaling operation
- With proFPGA FPGA modules, performance counters can be accessed via Ethernet as well through an MMI64-based monitor interface (see ESP software tools below)
- NoC adapters: AXI (to-NoC), AHB (to-NoC, from-NoC), APB (to-NoC, from-NoC), DMA (to/from-NoC), interrupt line (to-NoC, from-NoC)
- Other adapters: APB-to-AXI-Lite, custom memory link for ESP instances w/o integrated DDR controller (link-to-AHB, cache/DMA-to-link)
- NoC queues in every tile (processor, accelerator, memory, auxiliary, scratchpad)
- Dynamic Voltage-Frequency Scaling controller in every tile
- Single-tile test unit in every tile
- NoC
- Packet-switched NoC with lookahead routing, single-cycle hop, and configurable bitwidth
- ESP SoCs use 6 bidirectional physical NoC planes
- 3 for cache coherence messages (32-bits or 64-bits based on processor architecture)
- 2 for DMA messages (32-bits or 64-bits based on processor architecture)
- 1 32-bit plane for the other messages (interrupts, memory-mapped IO and configuration registers)
- Processor tile
- Processor
- Available options: 32-bit [Leon3](https://www.gaisler.com/index.php/products/processors/leon3) (Sparc v8) with ESP FPU, 64-bit [Ariane](https://github.com/openhwgroup/cva6) (RISC-V), 32-bit [Ibex](https://github.com/lowRISC/ibex) (RISC-V)
- L2 private cache (optional)
- NoC-based directory-based MESI protocol
- Available implementations: [SystemVerilog](https://github.com/sld-columbia/esp-caches/tree/master/l2), [SystemC](https://github.com/sld-columbia/esp-caches/tree/master/systemc/l2)
- Bus
- Memory request bus options: AXI, AHB
- Memory-mapped IO requests bus options: APB
- Support for SoCs with multiple processor tiles
- Accelerator tile
- Accelerator (see accelerator design flow options above)
- Accelerator socket
- Accelerator configuration registers (default registers + user-defined registers)
- Miss-free accelerator TLB for low-overhead virtual memory support
- Accelerator DMA engine
- Private cache (optional)
- Same as the L2 private cache in the processor tile
- Cache coherence
- Supported options: coherent with private cache, coherent DMA, LLC-coherent DMA, non-coherent DMA
- Configurable at run-time
- Point-to-point accelerator communication
- Configurable at run-time
- Support for SoCs with multiple accelerator tiles
- Third-party accelerator tile
- Accelerator socket
- Bus-to-NoC bridges
- Memory requests bus options: AXI
- Memory-mapped IO requests bus options: AXI-Lite, APB
- Support for SoCs with multiple third-party accelerator tiles
- Memory tile
- Last-level cache (LLC) slice (optional)
- NoC-based directory-based MESI protocol
- Support for coherent DMA and LLC-coherent DMA
- Available implementations: [SystemVerilog](https://github.com/sld-columbia/esp-caches/tree/master/llc), [SystemC](https://github.com/sld-columbia/esp-caches/tree/master/systemc/llc)
- Memory channel
- Optionally include AHB bus and memory controller in the memory tile
- Memory simulation model for full-system RTL simulation
- Support for all accelerator cache-coherence options
- Support for SoCs with multiple memory tiles
- Up to 2 memory tiles on proFPGA Virtex7 XC7V2000T FPGA module and up to 4 memory tiles on proFPGA Virtex UltraScale XCVU440 FPGA module
- Auxiliary tile
- Peripherals: Ethernet, UART, DVI (only on proFPGA FPGA modules with DVI interface board)
- ESP Link debug unit
- SoC initialization unit
- Interrupt controller: Leon3 multiprocessor interrupt controller or RISC-V platform interrupt controller
- Timer: GRLIB general-purpose timer or [RISC-V core-local interrupt controller](https://github.com/sld-columbia/ariane/tree/master/src/clint)
- Frame buffer
- Scratchpad (shared-local memory) tile
- Shared software-managed addressable memory
- Support for multiple SLM tiles
- SLM can replace external memory when configuring ESP with no memory tiles and selecting the Ibex core
- Additional SoC services
- ESP tile CSRs: memory mapped and accessible from software
- Configuration registers: PADs configuration, clock generators configuration, tile ID configuration, core ID configuration (processor tile only), Ethernet and UART scalers configuration (auxiliary tile only), soft reset
- Performance counters: accelerators activity, caches hit and miss rates, memory accesses, NoC routers traffic, dynamic voltage-frequency scaling operation
- With proFPGA FPGA modules, performance counters can be accessed via Ethernet as well through an MMI64-based monitor interface (see ESP software tools below)
- NoC adapters: AXI (to-NoC), AHB (to-NoC, from-NoC), APB (to-NoC, from-NoC), DMA (to/from-NoC), interrupt line (to-NoC, from-NoC)
- Other adapters: APB-to-AXI-Lite, custom memory link for ESP instances w/o integrated DDR controller (link-to-AHB, cache/DMA-to-link)
- NoC queues in every tile (processor, accelerator, memory, auxiliary, scratchpad)
- Dynamic Voltage-Frequency Scaling controller in every tile
- Single-tile test unit in every tile
- **ESP software stack**
- Support for Ariane, Leon3, and Ibex processors
- Linux SMP support (Ariane and Leon3 only)
- Bare-metal support
- Multi-core support (Leon3 only)
- Leon3 bare-metal multi-core test suite
- Accelerator-specific software
- ESP accelerator device driver
- LibESP: the ESP accelerator invocation API
- 3 functions: `esp_alloc`, `esp_run`, `esp_free`
- Manage the execution of multiple accelerators in parallel and/or in a pipeline
- Bare-metal unit-test sample applications for accelerators
- Linux unit-test sample applications for accelerators
- Multi-accelerator Linux applications examples
- Support for Ariane, Leon3, and Ibex processors
- Linux SMP support (Ariane and Leon3 only)
- Bare-metal support
- Multi-core support (Leon3 only)
- Leon3 bare-metal multi-core test suite
- Accelerator-specific software
- ESP accelerator device driver
- LibESP: the ESP accelerator invocation API
- 3 functions: `esp_alloc`, `esp_run`, `esp_free`
- Manage the execution of multiple accelerators in parallel and/or in a pipeline
- Bare-metal unit-test sample applications for accelerators
- Linux unit-test sample applications for accelerators
- Multi-accelerator Linux applications examples
- **ESP software tools**
- AccGen: accelerator skeleton generator, including testbench, device driver and test applications
- PLMGen: multi-port and multi-bank memory generator for SystemC accelerators
- AccGen: accelerator skeleton generator, including testbench, device driver and test applications
- PLMGen: multi-port and multi-bank memory generator for SystemC accelerators
- SoCGen: configure and generate an ESP SoC (batch or GUI)
- SocketGen: generate the RTL for some of the ESP tile sockets
- ESPLink: debug link via Ethernet from a host machine
- ESPMon: collection of hardware performance monitors accessed via Ethernet through the proFPGA MMI64 interface (batch or GUI)
- SocketGen: generate the RTL for some of the ESP tile sockets
- ESPLink: debug link via Ethernet from a host machine
- ESPMon: collection of hardware performance monitors accessed via Ethernet through the proFPGA MMI64 interface (batch or GUI)
- **Supported FPGA development boards**
- Xilinx Virtex UltraScale+ FPGA VCU118
- Xilinx Virtex UltraScale+ FPGA VCU128
- Xilinx Virtex-7 FPGA VC707
- proFPGA [Virtex7 XC7V2000T](https://www.profpga.com/products/fpga-modules-overview/virtex-7-based/profpga-xc7v2000t)
- proFPGA [Virtex Ultrascale XCVU440](https://www.profpga.com/products/fpga-modules-overview/virtex-ultrascale-based/profpga-xcvu440)
- Xilinx Virtex UltraScale+ FPGA VCU118
- Xilinx Virtex UltraScale+ FPGA VCU128
- Xilinx Virtex-7 FPGA VC707
- proFPGA [Virtex7 XC7V2000T](https://www.profpga.com/products/fpga-modules-overview/virtex-7-based/profpga-xc7v2000t)
- proFPGA [Virtex Ultrascale XCVU440](https://www.profpga.com/products/fpga-modules-overview/virtex-ultrascale-based/profpga-xcvu440)
- Xilinx Zynq UltraScale+ MPSoC ZCU102 (WIP)
- Xilinx Zynq UltraScale+ MPSoC ZCU106 (WIP)
- **Supported OS**
- CentOS 7 (recommended)
- Red Hat Enterprise Linux 7.8
- Ubuntu 18.04 (Cadence Stratus HLS not fully supported)
- CentOS 7 (recommended)
- Red Hat Enterprise Linux 7.8
- Ubuntu 18.04 (Cadence Stratus HLS not fully supported)
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,6 @@ please refer to the READMEs inside each of them for more information.

## Stay tuned for the new features under development:

- Multi-core RISC-V [Ariane](https://github.com/openhwgroup/cva6)
- Accelerator design flow in C/C++ and SystemC with Catapult HLS
- Regression testing
- Accelerator design flow in SystemC with Catapult HLS
- Dynamic partial reconfiguration SoC flow
- New machine learning and cryptography accelerators

0 comments on commit a8863f8

Please sign in to comment.