Skip to content

Latest commit

 

History

History
410 lines (341 loc) · 13.6 KB

README.md

File metadata and controls

410 lines (341 loc) · 13.6 KB

Logo

StreamBlocks Platforms Repository

Welcome to the StreamBlocks Platforms repository. This repository contains the code generators for the StreamBlocks dataflow compiler. If you are using StreamBlocks for the first time, this is the readme you need to follow to get a sense of the overall workflow and the different tools used within StreamBlocks.

StreamBlocks compiler provides a unified compilation framework for CPU-FPGA platforms. The figure below shows the compiler flow in full.

Compiler

StreamBlock transpiles dataflow programs written in CAL to C++ for multicore + FPGA platforms. Tycho is the frontend that takes CAL and produces and internal representation called the Actor Machine IR. Tycho's code is found in the streamblocks-tycho repositry.

The two hardware and software backends generate heterogeneous C++ code for execution. The code generators along with the runtime are part of the streamblocks-platform repository (i.e., this one!).

The partitioning tool is yet another repository, not surprisingly called streamblocks-partitioning.

The first two repositories are essential for code generation and execution. The last one is for design space exploration. streamblocks-partitioning uses profiling information obtained from either simulation or real execution to suggest pseudo-optimal hardware-software partitions.

This readme only walk you through setting up the first two repositories. A tutorial on setting up the partitioning is found in the corresponding streamblocks-partitioning repository. There is a 4th streamblocks-examples repository with a good number CAL programs.

The rest of this guide is organized as follows:

  1. The compiler basics
  2. Basic setup and dependencies
  3. Compiling a simple example

1. The compiler basics


The StreamBlocks dataflow compiler offers code-generation for multicore generic platforms and FPGAs through High-level synthesis. This repository contains the backends of the StreamBlocks-tycho dataflow frontend compiler.

To use the StreamBlocks platforms first you need to compile and install StreamBlocks-Tycho compiler streamblocks-tycho. We will go through installation in the next step.

The Tycho frontend does not get too far, it can generate some basic C code that is then compiled down to single-thread binaries for software execution but we never use that. Rather, we use Tycho's internal representation of an Actor network to generate heterogeneous C++ code.

This is where the current repository comes into play. Since we are targeting heterogeneous execution, we need to generate code for both software and hardware execution. We do this through platforms or basically different code generators or backends. You can find a brief description of each platform below:

Platform Description
platform-generic-c/ Generic monocore C code generation (deprecated, found in tycho)
platform-multicore/ Code generation for multi-threaded software execution
platform-vivadohls/ Code generation for Xilinx FPGAs by using Vivado HLS, SDAccel & Vitis
platform-node/ Code generation for multicore and multi-node execution, incomplete and experimental
platform-orcc/ Unused code software code generator based on the Orcc compiler
platform-core/ Basic utilities used by all the other platforms, does not really generate code

We basically just need to understand what platform-vivadohls and platform-multicore do. Each take a (part of) dataflow program and generate HLS or software C++ respectively. They could while platform-multicore can be used completely independently but platform-vivadohls is not standalone. This is because any HLS code needs some software code that feeds it data and intercepts its output.

2. Basic setup and dependencies


Dependencies

  • StreamBlocks platforms are written with Java 8, you will need a compatible Java SE Development Kit 8 (or later), Apache Maven and Git.

  • The generated C multithreaded source code of StreamBlocks has the following dependencies: CMake, libxml2 and (optionaly) libsdl2.

  • The generated C++ for Vivado HLS source code of StreamBlocks, needs the Xilinx Vivado Design Suite. You also need xilinx run time or XRT installed with the FPGA platforms you want to use.

Setup

Once you have all the dependencies set up. Create a directory called streamblocks somewhere in your system and go to that directory.

> mkdir streamblocks
> cd streamblocks

Clone streamblocks-tycho and install it using maven (this will install some jar files somewhere in your home directory which is picked up by streamblocks-platforms).

> git clone https://github.com/streamblocks/streamblocks-tycho
> cd streamblocks-tycho &&  mvn install -DskipTests && cd ..

Maven should succeed, then clone this repository and install it using maven.

> git clone https://github.com/streamblocks/streamblocks-platforms
> cd streamblocks-platforms &&  mvn install

3. Running a simple example


In the streamblocks-platforms directory running streamblocks --help should welcome you with some basic command line options. We will first go through the software execution flow and then we will give you a tour of how you can compile code for heterogeneous platforms (slightly more complicated). This guide does not cover our partitioning methodology though, for that refer to the streamblocks-partitioning repository.

Software execution

To actually execute something, let's write a simple CAL program:

> echo '
namespace hetero.simple:
  actor Source(int payload_size) ==> int Out:
    int counter := 0;
    transmit: action ==> Out:[t]
    guard counter < payload_size
    var t = counter
    do
      println("Tx: " + t);
      counter := counter + 1;
    end
  end

  actor Sink() int In ==>:
    action In:[t] ==>
    do
        println("Rx: " + t);
    end
  end

  actor Pass() int In ==> int Out:
    action In:[t] ==> Out:[t]
    end
  end

  network PassThrough() ==> :
  entities
    source = Source(payload_size = 20);
    pass  = Pass() { partition = "hw"; };
    sink = Sink();
  structure
    source.Out --> pass.In { bufferSize = 1; };
    pass.Out --> sink.In { bufferSize = 1; } ;
  end

end' > simple.cal

You can compile this program program using:

> ./streamblocks multicore --source-path simple.cal --target-path myproject hetero.simple.PassThrough

Note that we have to specify the source files, an output directory, and the name of the top network (which does not have any inputs or outputs) to the compiler. Once streamblocks finishes successfully, you see a new directory myproject with the following structure:

myproject
├── bin
│   ├── configuration.xcf
│   ├── PassThrough.py
│   ├── PassThrough.script
│   └── streamblocks.py
├── build
├── CMakeLists.txt
├── code-gen
│   ├── auxiliary
│   ├── CMakeLists.txt
│   ├── include
│   └── src
└── lib
    ├── art-genomic
    ├── art-native
    ├── art-node
    ├── art-runtime
    ├── cmake
    └── CMakeLists.txt

Thebin does not contain the final executable at this point. The python files are not currently used but generated nonetheless and you should just ignore them. To get an executable, you should compile the generated C++ files down to binary, this is done quite simply:

> mkdir -p myproject/build
> cd myproject/build
> cmake ..
> cmake --build .

This will create an executable bin/PassThrough:

> cd ../bin
> ./PassThrough
Tx: 0
Tx: 1
Rx: 0
Tx: 2
Rx: 1
Tx: 3
Rx: 2
Tx: 4
Rx: 3
Tx: 5
Rx: 4
Tx: 6
Rx: 5
Tx: 7
...

Note that here we used a single thread to execute the three actors in software. Using multiple threads is quite simple, you need a configuration file to specify the actor to thread mapping. The PassThrough executable can generate this file, and you can then simply modify it:

> ./PassThrough --generate=threads.xml

Or you can write it yourself (e.g., use one thread per actor):

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
	<partitioning>
		<partition id="0" scheduling="ROUND_ROBIN">
			<instance id="source"/>
        </partition>
        <partition id="1" scheduling="ROUND_ROBIN">
			<instance id="pass"/>
        </partition>
        <partition id="2" scheduling="ROUND_ROBIN">
			<instance id="sink"/>
		</partition>
	</partitioning>
</configuration>

And use for multi-thread execution:

> ./PassThrough --cfile=threads.xml

Heterogeneous execution

For heterogeneous code we have to call the compiler twice with an extra --set partitioning=on argument:

> ./streamblocks multicore --source-path simple.cal --target-path myproject --set partitioning=on hetero.simple.PassThrough
> ./streamblocks vivado-hls --source-path simple.cal --target-path myproject --set partitioning=on hetero.simple.PassThrough

Note that the two commands above merely generate C++ code for hardware and software. Like the software-only flow, we have to further build the FPGA and host binaries using cmake.

Here is an example of generated directories:

myproject
├── CMakeLists.txt
├── multicore
│   ├── bin
│   ├── build
│   ├── CMakeLists.txt
│   ├── code-gen
│   │   ├── auxiliary
│   │   ├── CMakeLists.txt
│   │   ├── include
│   │   └── src
│   └── lib
│       ├── art-genomic
│       ├── art-native
│       ├── art-node
│       ├── art-plink
│       ├── art-runtime
│       ├── cmake
│       └── CMakeLists.txt
└── vivado-hls
    ├── bin
    │   ├── xclbin
    │   └── xrt.ini
    ├── build
    ├── cmake
    │   ├── FindSDAccel.cmake
    │   ├── FindVitis.cmake
    │   ├── FindVitisHLS.cmake
    │   ├── FindVivado.cmake
    │   ├── FindVivadoHLS.cmake
    │   ├── FindXRT.cmake
    │   └── Helper.cmake
    ├── CMakeLists.txt
    ├── code-gen
    │   ├── auxiliary
    │   ├── host
    │   ├── include
    │   ├── include-tb
    │   ├── rtl
    │   ├── rtl-tb
    │   ├── src
    │   ├── src-tb
    │   ├── tcl
    │   ├── wcfg
    │   └── xdc
    ├── output
    │   ├── fifo-traces
    │   └── kernel
    ├── scripts
    └── systemc
        ├── include
        └── src

In these rather large directory of files, what matters is the top-level CMakeLists.txt files which can be used to build a heterogeneous executable.

To build hardware targets, you need to have a working Vitis and Vivaod HLS installation. Ideally use the 2019.2 versions (newer versions may work but have not been really tested). Vitits is installed in ${VITIS_DIR} you can make it available in the ${PATH} by:

source ${VITIS_DIR}/settings64.sh

You also need to have ${XILINX_XRT} set to where XRT. Assuming you have XRT installed in /opt/xilinx/xrt:

export XILINX_XRT=/opt/xilinx/xrt

With these environment variables set, you can proceed to building an FPGA binary

> mkdir -p myproject/build
> cd myproject/build
> cmake .. -DHLS_CLOCK_PERIOD=3.3 -DFPGA_NAME=xcu200-fsgd2104-2-e -DPLATFORM=xilinx_u200_xdma_201830_2 -DUSE_VITIS=on -DCMAKE_BUILD_TYPE=Debug
> cmake --build . --target PassThrough_kernel_xclbin -- -j 4

This can take several hours. To get a simulation binary instead you can use -DTARGET=hw_emu to use the hardware emulation mode in which the hardware execution is simulated in software (see hardware emulation mode in Vitis). This results in a faster compilation time but orders of magnitude slower execution time.

Likewise, you can build the software binary by

> cmake --build . --target PassThrough -- -j 4

Once both targets are ready, you can execute the program:

> cd ../bin
> ./PassThrough

Note that there is no cmake dependency between the software and hardware binary. Therefore, if you don't build the hardware binary you will end up with a runtime error regarding a missing file. The hardware binary is be placed in bin/xclbin/ and is supposed to be present when you call ./PassThrough.

This rather tedious flow can easily be scripted. We plan to streamline compilation in the future by integrating all the steps in one place. But for now consider writing your own scripts. You can checkout the streamblocks-example repository to ge inspired by how you can use cmake to fully automate the process.