This is a short guide that walks you through how to compile the benchmarks/hexagon/halide/add
benchmark (taken from the Hannk app inside Halide's repository) using Halide+Rake.
The first step is to install our modified version of Halide found here. This version of Halide is a fork of Halide master that bypasses the built-in HexagonOptimizer and instead leverages Rake for instruction selection. The build procedure and dependencies are identical to Halide master and instructions on how to build Halide can be found in the Halide Repo.
While it is possible to do an end-to-end push-button compilation of a benchmark, I advise against it during development or experimentation. Instead, it is best to perform compilation in three individual steps.
The first step is to use our Halide fork to generate specification files for Rake. A specification file (spec file for short) is an input to Rake generated by Halide for each qualifying expression in the input benchmark. It expresses the original Halide IR expression that needs to be optimized in Rosette, any available axioms that the synthesizer may use, as well as any configuration parameters (such as the target ISA).
To generate the spec files without actually running the synthesizer, use the following commands:
$ export LLVM_CONFIG=/path/to/llvm-config # Make sure LLVM_CONFIG points to your LLVM build
$ export HALIDE_RAKE_GENSPEC=1 # Tell Halide to generate specs but not run Rake
$ export HALIDE_DIR=/path/to/rake-halide/distrib # Set path to the Halide installation (or edit the Makefile below)
$ cd benchmarks/hexagon/halide # Go to benchmarks directory
$ make add # Run make for the add benchmark
If everything was setup correctly, this should generate four spec files expr[0..3].rkt
in the working directory. A copy of these spec files is also available in the test/add
directory.
Rosette is a solver-aided programming language that extends Racket with language constructs for program synthesis, verification, and more. We have a (slightly) modified version of Rosette that we use for Rake, although the goal is to eventually rely on Rosette master for simplicity.
To install Rosette, first download and install Racket 8.0 or later from http://racket-lang.org. Then, run the following commands:
$ cd /path/to/repo/rosette # Go to the rosette directory inside the repo
$ raco pkg install # Run raco package installation
Once Rosette is successfully installed, we can install Rake using the following commands:
$ cd /path/to/repo/rake # Go to the rake directory inside the repo
$ raco pkg install # Run raco package installation
With Rake installed, we are now ready to run the spec files generated earlier:
$ racket expr_0.rkt
On my desktop PC (AMD 5950X), this step requires about 3 minutes for each spec file. Upon (successful) completion of a spec file, Rake will print the synthesized HVX expression and also write it to the file sexp_0.out
. In addition, Rake will generate a expr_0.runtimes
file that logs all the synthesis and verification queries executed and the time taken for each, which is helpful for evaluation.
A copy of the generated .out
and .runtime
files is available in the test/add
directory.
Finally, we can now compile the add benchmark to generate an optimized executable:
$ unset HALIDE_RAKE_GENSPEC # Unset the flag so that Halide runs Rake
$ export HALIDE_DIR=/path/to/rake-halide/distrib # Set path to the Halide installation (or edit the Makefile below)
$ cd benchmarks/hexagon/halide # Go to benchmarks directory
$ make add # Run make for the add benchmark
Note: When compiling a benchmark, our version of Halide looks for the corresponding .out
file (in the working directory) for each spec file. For example, when optimizing the first expression (i.e. expr_0.rkt
), Halide will look for the file sexp_0.out
. If such a file is not present, Halide will run Rake to generate HVX expressions (like we did manually above). If such a file is present, Halide will not run Rake but instead re-use the previously generated output. This is helpful when debugging the codegen step as it allows us to bypass the synthesis stage that is usually quite expensive. The re-use logic is dumb and only checks for the matching filename and does not even verify if the .out
file was generated from the same benchmark.
Since we have already generated the .out
files for the add
benchmark, we can avoid re-synthesizing the expressions by moving them to the working directory before running the Makefile.
Now that we have a compiled binary, we can either run it on a Hexagon device or Qaulcomm's simulator. Here I only share instructions for doing the latter, though more information (as well as examples) on how to run Hexagon apps on the device are available in the Halide repo.
Qualcomm's SDK is available here: https://developer.qualcomm.com/software/hexagon-dsp-sdk/tools
First, make sure environment variables HEXAGON_SIM
and HEXAGON_CLANG
point to your SDK installation correctly:
$ export HEXAGON_SIM=/path/to/Qualcomm/Hexagon_SDK/3.5.2/tools/HEXAGON_Tools/8.3.07/Tools/bin/hexagon-sim
$ export HEXAGON_CLANG=/path/to/Qualcomm/Hexagon_SDK/3.5.2/tools/HEXAGON_Tools/8.3.07/Tools/bin/hexagon-clang++
Then, run the simulator:
$ cd benchmarks/hexagon/halide # Go to benchmarks directory
$ make add sim=true # Run make for the add benchmark
cpp-types := int8_t | int16_t | int32_t | int64_t
| uint8_t | uint16_t | uint32_t | uint64_t
lib-fns := abs8 | abs16 | abs32 | absu8 | absu16 | absu32
| min8 | min16 | min32 | minu8 | minu16 | minu32
| max8 | max16 | max32 | maxu8 | maxu16 | maxu32
misc-fns := eval : (cpp-expr) -> bitvector
| eval-to-int : (cpp-expr) -> integer
| mk-cpp-type : (bit-wdith, signed?) -> cpp-type
| mk-cpp-expr : (bitvector, cpp-type) -> cpp-expr
| cpp-type : (cpp-expr) -> cpp-type
| signed-expr? : (cpp-expr) -> bool
| unsigned-expr? : (cpp-expr) -> bool
| expr-bw : (cpp-expr) -> int
| cpp-type-str : (cpp-type) -> str
| signed-type? : (cpp-type) -> bool
| unsigned-type? : (cpp-type) -> bool
| type-bw : (cpp-type) -> int
types := buffer (data elemT)
| vector (data)
constructors := load (buffer indexes align) | ramp (base stride len)
| x32 (scalar) | x64 (scalar) | x128 (scalar) | x256 (scalar)
casts := uint8x1 | uint16x1 | uint32x1 | uint64x1
| uint1x32 | uint1x64 | uint1x128 | uint1x256
| uint8x32 | uint8x64 | uint8x128 | uint8x256
| uint16x32 | uint16x64 | uint16x128 | uint16x256
| uint32x32 | uint32x64 | uint32x128 | uint32x256
| uint64x32 | uint64x64 | uint64x128 | uint64x256
| int8x1 | int16x1 | int32x1 | int64x1
| int8x32 | int8x64 | int8x128 | int8x256
| int16x32 | int16x64 | int16x128 | int16x256
| int32x32 | int32x64 | int32x128 | int32x256
| int64x32 | int64x64 | int64x128 | int64x256
scalar-ops := sca-add | sca-sub | sca-mul | sca-div | sca-min | sca-max
vector-ops := vec-add | vec-sub | vec-mul | vec-div | vec-min | vec-max
| vec-if | vec-lt | vec-le | vec-absd | vec-shl | vec-shr
shuffle-ops := slice_vectors (vec base stride len) | concat_vectors (vec1 vec2)
| dynamic_shuffle (vec idxs st end)
lib-fns := halide-elem-type : (halide-expr) -> cpp-type
| halide-vec-len : (halide-expr) -> int
| halide-sub-exprs : (halide-expr) -> list<halide-expr>
| halide-cast-op? : (halide-expr) -> bool
analysis := extract-live-buffers-halide : (halide-expr) -> set<halide-buffer>
| extract-buffer-reads-halide : (halide-expr) -> hash-table<int,set<buffer-ref>>**
| extract-add-consts-halide : (halide-expr) -> set<int>
| extract-sub-consts-halide : (halide-expr) -> set<int>
| extract-mul-consts-halide : (halide-expr) -> set<int>
| extract-div-consts-halide : (halide-expr) -> set<int>
| extract-shr-consts-halide : (halide-expr) -> set<int>
** For each lane of the vector computed by the Halide expression, the hash-table maps to the set of live buffer-reads
involved in its computation
Currently truncated, re-introducing uber-instructions as I make progress with refactoring.
types := load-data : (live-reads) -> vector
| broadcast : (scalar) -> vector
| combine : (ir-expr ir-expr) -> ir-expr
| cast : (ir-expr type) -> ir-expr
| vs-mpy-add : (ir-expr weight-matrix output-type saturate?) -> ir-expr
| vs-mpy-add-acc : (ir-expr ir-expr weight-matrix output-type saturate?) -> ir-expr
| vv-mpy-add : (ir-expr weight-matrix output-type saturate?) -> ir-expr
| add-const : (ir-expr const) -> ir-expr
| shift-right : (ir-expr shift saturate? round? arithmetic? output-type) -> ir-expr
| divide-by-const : (ir-expr const) -> ir-expr
| average : (ir-expr round? output-type?) -> ir-expr
| modulo-by-const : (ir-expr const) -> ir-expr
| maximum : (ir-expr ir-expr) -> ir-expr
| minimum : (ir-expr ir-expr) -> ir-expr
| saturate : (ir-expr round? output-type) -> ir-expr
| abs-diff : (ir-expr ir-expr) -> ir-expr
| less-than : (ir-expr ir-expr) -> ir-expr
| less-than-eq : (ir-expr ir-expr) -> ir-expr
| select : (ir-expr ir-expr ir-expr) -> ir-expr
To enable debug-print
statements inside rake, set RAKE_DEBUG_CODEGEN=1
.
To generate spec files using Halide without running the synthesizer, set HALIDE_RAKE_GENSPEC
environment variable.