Project Structure

CUTLASS is arranged as a header-only library along with Utilities, Tools, Examples, and unit tests. Doxygen documentation provides a complete list of files, classes, and template concepts defined in the CUTLASS project.

A detailed explanation of the source code organization may be found in the CUTLASS documentation, but several main components are summarized below.

CUTLASS Template Library

include/                     # client applications should target this directory in their build's include paths

  cutlass/                   # CUDA Templates for Linear Algebra Subroutines and Solvers - headers only

    arch/                    # direct exposure of architecture features (including instruction-level GEMMs)

    conv/                    # code specialized for convolution

    gemm/                    # code specialized for general matrix product computations

    layout/                  # layout definitions for matrices, tensors, and other mathematical objects in memory

    platform/                # CUDA-capable Standard Library components

    reduction/               # bandwidth-limited reduction kernels that do not fit the "gemm" model
    
    transform/               # code specialized for layout, type, and domain transformations

    *                        # core vocabulary types, containers, and basic numeric operations

CUTLASS SDK Examples

CUTLASS SDK examples apply CUTLASS templates to implement basic computations.

examples/
  00_basic_gemm/                   # launches a basic GEMM with single precision inputs and outputs

  01_cutlass_utilities/            # demonstrates CUTLASS Utilities for allocating and initializing tensors
  
  02_dump_reg_smem/                # debugging utilities for printing register and shared memory contents
  
  03_visualize_layout/             # utility for visualizing all layout functions in CUTLASS

  04_tile_iterator/                # example demonstrating an iterator over tiles in memory

  05_batched_gemm/                 # example demonstrating CUTLASS's batched strided GEMM operation

  06_splitK_gemm/                  # exmaple demonstrating CUTLASS's Split-K parallel reduction kernel

  07_volta_tensorop_gemm/          # example demonstrating mixed precision GEMM using Volta Tensor Cores

  08_turing_tensorop_gemm/         # example demonstrating integer GEMM using Turing Tensor Cores

  09_turing_tensorop_conv2dfprop/  # example demonstrating integer implicit GEMM convolution (forward propagation) using Turing Tensor Cores

  10_planar_complex/               # example demonstrating planar complex GEMM kernels

  11_planar_complex_array/         # example demonstrating planar complex kernels with batch-specific problem sizes

  12_gemm_bias_relu/               # example demonstrating GEMM fused with bias and relu

  13_fused_two_gemms/              # example demonstrating two GEMms fused in one kernel

  22_ampere_tensorop_conv2dfprop/  # example demonstrating integer implicit GEMM convolution (forward propagation) using Ampere Tensor Cores

  31_basic_syrk                    # example demonstrating Symetric rank-K update

  32_basic_trmm                    #

  33_ampere_3xtf32_tensorop_symm   #

  35_gemm_softmax                  # example demonstrating GEMM fused with Softmax in mixed precision using Ampere Tensor Cores

  40_cutlass_py                    # example demonstrating CUTLASS with CUDA Python

Tools

tools/
  library/                   # CUTLASS Instance Library - contains instantiations of all supported CUTLASS templates
    include/
      cutlass/
        library/

  profiler/                  # CUTLASS Profiler         - command-line utility for executing operations in the
                             #                            CUTLASS Library
  
  util/                      # CUTLASS Utilities        - contains numerous helper classes for
    include/                 #                            manging tensors in device memory, reference
      cutlass/               #                            implementations for GEMM, random initialization
        util/                #                            of tensors, and I/O.

Test

The test/unit/ directory consist of unit tests implemented with Google Test that demonstrate basic usage of Core API components and complete tests of the CUTLASS GEMM computations.

Instructions for building and running the Unit tests are described in the Quickstart guide.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project Structure

CUTLASS Template Library

CUTLASS SDK Examples

Tools

Test

Clone this wiki locally