We intend on creating a more accessible api for integrating the SpMM kernels into other projects and recommend waiting for that (over working with this codebase) if possible.
ensure your machine has avx512vl
lscpu | grep avx512vl
clone the repo
git clone https://github.com/SpRegTiling/sparse-register-tiling.git --recurse-submodules
cd sparse-register-tiling
If cloning submodules fails, e.g. due to github key setup, you may clone the the submodule manually by:
git clone https://github.com/LucasWilkinson/spmm-nano-kernels.git spmm_nano_kernels
Download the dlmc dataset, this will download it to ../dlmc
sh download_dlmc.sh
pip3 install torch
pip3 install -e .
pip3 install pandas \
matplotlib \
scipy \
pyyaml
export PYTHONPATH=$(pwd):$PYTHONPATH
wget https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB && \
add-apt-repository -y "deb https://apt.repos.intel.com/oneapi all main" && \
apt-key add GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB && \
rm GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB && \
apt-get update -y && \
apt-get install -y intel-oneapi-mkl-devel-2021.4.0
from spmm_nano_kernels
, run the following:
cd spmm_nano_kernels
python3 -m codegen.generate_ukernels
cd ..
mkdir release-build
cmake -Brelease-build -DCMAKE_BUILD_TYPE=Release -DENABLE_AVX512=True .
make -j 16 -Crelease-build SPMM_demo
python3 run_matrix.py -m ../dlmc/transformer/magnitude_pruning/0.8/body_decoder_layer_0_ffn_conv1_fully_connected.smtx -t 8 -b 512 -o results.csv -d
see:
python3 run_matrix.py --help
for more details
cpp_testbed/demo/SPMM_demo.cpp
: benchmarking code (best to use thepython3 artifact/run_matrix.py
wrapper)spmm_nano_kernels/codegen/generate_ukernels.py
: entry point for generating scheduler/executor pairsspmm_nano_kernels/codegen/base_ukernel_codegen.py
: code for generating mini-register tiles, this code in conjunction withspmm_nano_kernels/include/Executor.h
forms the executor codespmm_nano_kernels/include/MicroKernelPacker.h
: templated scheduler + data-compressor codespmm_nano_kernels/include/MatMulSpecialized.h
: wrapper that contains a scheduler/executor pairsbench/ilp_pad/nano_solver.py
contains the code for the mathematical model