Skip to content

Latest commit

 

History

History
173 lines (144 loc) · 11.1 KB

Tutorial.md

File metadata and controls

173 lines (144 loc) · 11.1 KB

Running a benchmark (Under Construction)

This is a short guide that walks you through how to compile the benchmarks/hexagon/halide/add benchmark (taken from the Hannk app inside Halide's repository) using Halide+Rake.

Installing Halide

The first step is to install our modified version of Halide found here. This version of Halide is a fork of Halide master that bypasses the built-in HexagonOptimizer and instead leverages Rake for instruction selection. The build procedure and dependencies are identical to Halide master and instructions on how to build Halide can be found in the Halide Repo.

Generating Rake specification files

While it is possible to do an end-to-end push-button compilation of a benchmark, I advise against it during development or experimentation. Instead, it is best to perform compilation in three individual steps.

The first step is to use our Halide fork to generate specification files for Rake. A specification file (spec file for short) is an input to Rake generated by Halide for each qualifying expression in the input benchmark. It expresses the original Halide IR expression that needs to be optimized in Rosette, any available axioms that the synthesizer may use, as well as any configuration parameters (such as the target ISA).

To generate the spec files without actually running the synthesizer, use the following commands:

$ export LLVM_CONFIG=/path/to/llvm-config                # Make sure LLVM_CONFIG points to your LLVM build
$ export HALIDE_RAKE_GENSPEC=1                           # Tell Halide to generate specs but not run Rake
$ export HALIDE_DIR=/path/to/rake-halide/distrib         # Set path to the Halide installation (or edit the Makefile below)
$ cd benchmarks/hexagon/halide                           # Go to benchmarks directory
$ make add                                               # Run make for the add benchmark

If everything was setup correctly, this should generate four spec files expr[0..3].rkt in the working directory. A copy of these spec files is also available in the test/add directory.

Installing Rosette

Rosette is a solver-aided programming language that extends Racket with language constructs for program synthesis, verification, and more. We have a (slightly) modified version of Rosette that we use for Rake, although the goal is to eventually rely on Rosette master for simplicity.

To install Rosette, first download and install Racket 8.0 or later from http://racket-lang.org. Then, run the following commands:

$ cd /path/to/repo/rosette                               # Go to the rosette directory inside the repo
$ raco pkg install                                       # Run raco package installation

Installing Rake

Once Rosette is successfully installed, we can install Rake using the following commands:

$ cd /path/to/repo/rake                                  # Go to the rake directory inside the repo
$ raco pkg install                                       # Run raco package installation

Running Rake to generate HVX expressions

With Rake installed, we are now ready to run the spec files generated earlier:

$ racket expr_0.rkt

On my desktop PC (AMD 5950X), this step requires about 3 minutes for each spec file. Upon (successful) completion of a spec file, Rake will print the synthesized HVX expression and also write it to the file sexp_0.out. In addition, Rake will generate a expr_0.runtimes file that logs all the synthesis and verification queries executed and the time taken for each, which is helpful for evaluation.

A copy of the generated .out and .runtime files is available in the test/add directory.

Compiling the benchmark

Finally, we can now compile the add benchmark to generate an optimized executable:

$ unset HALIDE_RAKE_GENSPEC                              # Unset the flag so that Halide runs Rake
$ export HALIDE_DIR=/path/to/rake-halide/distrib         # Set path to the Halide installation (or edit the Makefile below)
$ cd benchmarks/hexagon/halide                           # Go to benchmarks directory
$ make add                                               # Run make for the add benchmark

Note: When compiling a benchmark, our version of Halide looks for the corresponding .out file (in the working directory) for each spec file. For example, when optimizing the first expression (i.e. expr_0.rkt), Halide will look for the file sexp_0.out. If such a file is not present, Halide will run Rake to generate HVX expressions (like we did manually above). If such a file is present, Halide will not run Rake but instead re-use the previously generated output. This is helpful when debugging the codegen step as it allows us to bypass the synthesis stage that is usually quite expensive. The re-use logic is dumb and only checks for the matching filename and does not even verify if the .out file was generated from the same benchmark.

Since we have already generated the .out files for the add benchmark, we can avoid re-synthesizing the expressions by moving them to the working directory before running the Makefile.

Running the benchmark

Now that we have a compiled binary, we can either run it on a Hexagon device or Qaulcomm's simulator. Here I only share instructions for doing the latter, though more information (as well as examples) on how to run Hexagon apps on the device are available in the Halide repo.

Installing Qualcomm's SDK

Qualcomm's SDK is available here: https://developer.qualcomm.com/software/hexagon-dsp-sdk/tools

Running the simulator

First, make sure environment variables HEXAGON_SIM and HEXAGON_CLANG point to your SDK installation correctly:

$ export HEXAGON_SIM=/path/to/Qualcomm/Hexagon_SDK/3.5.2/tools/HEXAGON_Tools/8.3.07/Tools/bin/hexagon-sim
$ export HEXAGON_CLANG=/path/to/Qualcomm/Hexagon_SDK/3.5.2/tools/HEXAGON_Tools/8.3.07/Tools/bin/hexagon-clang++

Then, run the simulator:

$ cd benchmarks/hexagon/halide                           # Go to benchmarks directory
$ make add sim=true                                      # Run make for the add benchmark

Documentation (Under construction)

C++ Support

cpp-types  :=  int8_t | int16_t | int32_t | int64_t 
           |   uint8_t | uint16_t | uint32_t | uint64_t

  lib-fns  :=  abs8 | abs16 | abs32 | absu8 | absu16 | absu32
           |   min8 | min16 | min32 | minu8 | minu16 | minu32
           |   max8 | max16 | max32 | maxu8 | maxu16 | maxu32

 misc-fns  :=  eval           :    (cpp-expr) -> bitvector 
           |   eval-to-int    :    (cpp-expr) -> integer 
           |   mk-cpp-type    :    (bit-wdith, signed?) -> cpp-type 
           |   mk-cpp-expr    :    (bitvector, cpp-type) -> cpp-expr
           |   cpp-type       :    (cpp-expr) -> cpp-type
           |   signed-expr?   :    (cpp-expr) -> bool
           |   unsigned-expr? :    (cpp-expr) -> bool
           |   expr-bw        :    (cpp-expr) -> int
           |   cpp-type-str   :    (cpp-type) -> str
           |   signed-type?   :    (cpp-type) -> bool
           |   unsigned-type? :    (cpp-type) -> bool
           |   type-bw        :    (cpp-type) -> int

Halide IR Support

       types  :=   buffer (data elemT) 
              |   vector (data)

constructors  :=   load (buffer indexes align) | ramp (base stride len)
              |    x32 (scalar) | x64 (scalar) | x128 (scalar) | x256 (scalar)

       casts  :=  uint8x1 | uint16x1 | uint32x1 | uint64x1 
              |   uint1x32 | uint1x64 | uint1x128 | uint1x256
              |   uint8x32 | uint8x64 | uint8x128 | uint8x256
              |   uint16x32 | uint16x64 | uint16x128 | uint16x256
              |   uint32x32 | uint32x64 | uint32x128 | uint32x256
              |   uint64x32 | uint64x64 | uint64x128 | uint64x256
              |   int8x1 | int16x1 | int32x1 | int64x1
              |   int8x32 | int8x64 | int8x128 | int8x256 
              |   int16x32 | int16x64 | int16x128 | int16x256 
              |   int32x32 | int32x64 | int32x128 | int32x256 
              |   int64x32 | int64x64 | int64x128 | int64x256
              
  scalar-ops  :=  sca-add | sca-sub | sca-mul | sca-div | sca-min | sca-max
  
  vector-ops  :=  vec-add | vec-sub | vec-mul | vec-div | vec-min | vec-max
              |   vec-if  | vec-lt  | vec-le | vec-absd | vec-shl | vec-shr

 shuffle-ops  :=  slice_vectors (vec base stride len) | concat_vectors (vec1 vec2)
              |   dynamic_shuffle (vec idxs st end)

     lib-fns  :=  halide-elem-type : (halide-expr) -> cpp-type
              |   halide-vec-len : (halide-expr) -> int
              |   halide-sub-exprs : (halide-expr) -> list<halide-expr>
              |   halide-cast-op? : (halide-expr) -> bool

    analysis  :=  extract-live-buffers-halide : (halide-expr) -> set<halide-buffer>
              |   extract-buffer-reads-halide : (halide-expr) -> hash-table<int,set<buffer-ref>>**
              |   extract-add-consts-halide : (halide-expr) -> set<int>
              |   extract-sub-consts-halide : (halide-expr) -> set<int>
              |   extract-mul-consts-halide : (halide-expr) -> set<int>
              |   extract-div-consts-halide : (halide-expr) -> set<int>
              |   extract-shr-consts-halide : (halide-expr) -> set<int>
  
** For each lane of the vector computed by the Halide expression, the hash-table maps to the set of live buffer-reads 
involved in its computation

HVX Lifting IR

Currently truncated, re-introducing uber-instructions as I make progress with refactoring.

       types  :=   load-data : (live-reads) -> vector
              |    broadcast : (scalar) -> vector
              |    combine : (ir-expr ir-expr) -> ir-expr
              |    cast : (ir-expr type) -> ir-expr
              |    vs-mpy-add : (ir-expr weight-matrix output-type saturate?) -> ir-expr
              |    vs-mpy-add-acc : (ir-expr ir-expr weight-matrix output-type saturate?) -> ir-expr
              |    vv-mpy-add : (ir-expr weight-matrix output-type saturate?) -> ir-expr
              |    add-const : (ir-expr const) -> ir-expr
              |    shift-right : (ir-expr shift saturate? round? arithmetic? output-type) -> ir-expr
              |    divide-by-const : (ir-expr const) -> ir-expr
              |    average : (ir-expr round? output-type?) -> ir-expr
              |    modulo-by-const : (ir-expr const) -> ir-expr
              |    maximum : (ir-expr ir-expr) -> ir-expr
              |    minimum : (ir-expr ir-expr) -> ir-expr
              |    saturate : (ir-expr round? output-type) -> ir-expr              
              |    abs-diff : (ir-expr ir-expr) -> ir-expr
              |    less-than : (ir-expr ir-expr) -> ir-expr
              |    less-than-eq : (ir-expr ir-expr) -> ir-expr
              |    select : (ir-expr ir-expr ir-expr) -> ir-expr

Environment Variables

To enable debug-print statements inside rake, set RAKE_DEBUG_CODEGEN=1.

To generate spec files using Halide without running the synthesizer, set HALIDE_RAKE_GENSPEC environment variable.