mpirun -n 4 ./uspmv <matrix_name>.mtx <kernel_format> <options>
./uspmv <matrix_name>.mtx scs -c 16 -s 512 -mode b
./uspmv <matrix_name>.mtx crs -mode s -block_vec_size 2 -verbose 1
- kernel_format can be any one of: crs, scs (and by extention: ell and sell-p)
- -block_vec_size (int: width of block vectors for SpMMV)
- -c (int: chunk size (required for scs))
- -s (int: sigma (required for scs))
- -rev (int: number of back-to-back revisions to perform)
- -rand_x (0/1: random x vector option)
- -dp / sp / hp / ap[dp_sp] / ap[dp_hp] / ap[sp_hp] / ap[dp_sp_hp] [%s] (numerical precision of matrix data)
- -seg_metis / seg_nnz / seg_rows [%s] (global matrix partitioning for MPI)
- -validate (0/1: check result against MKL option)
- -verbose (0/1: verbose validation of results)
- -mode ('s'/'b': either in solve mode or bench mode)
- -bench_time (float: minimum number of seconds for SpMV benchmark)
- -ba_synch (0/1: synch processes each benchmark loop)
- -comm_halos (0/1: communicate halo elements each benchmark loop)
- -par_pack (0/1: pack elements contigously for MPI_Isend in parallel)
- -no_pack (0/1: skip packing remote elements to assess performance penalty)
- -print_comm_vol (0/1: report the number of elements received on this MPI process per SpMV (requires bench mode))
- -equilibrate (0/1: normalize rows of matrix)
- --------------------------- Adaptive Precision Options ---------------------------
- -ap_threshold_1 (float: threshold for two-way matrix partitioning for adaptive precision
) - -ap_threshold_2 (float: threshold for three-way matrix partitioning for adaptive precision
) - -dropout (0/1: enable dropout of elements below theh designated threshold)
- -dropout_threshold (float: remove matrix elements below this range)
- The -c and -s options are only relevant when the scs kernel is selected
- If interested in only single-vector SpMV, please select
vector layout - Select compiler in Makefile (gcc, icc, icx, llvm, nvcc)
- icc is legacy, and not advised
- VECTOR_LENGTH for SIMD instructions is also defined at the top of the Makefile, useful for non-SELL_C_SIGMA kernels
- If using AVX512 on icelake, I currently get around downfall perf bug with the icx compiler from OneAPI 2023.2.0
- The par_pack option typically yields better performance for MPI+Openmp with poorly load balanced matrices
- Thresholds for adaptive precision are expected in the order