-
Notifications
You must be signed in to change notification settings - Fork 99
ETI System and file structure
This wiki page is based on the following issue: https://github.com/kokkos/kokkos-kernels/issues/31.
- Pre-compile functions, and prevent them from being implicitly instantiated (ETI)
- Even with ETI on, allow other input types (say for example extended precision, or nonstandard data layouts)
- Call TPLs (MKL, CUBLAS etc.) for input types which allow it
- Disallow anything other than ETI types if requested (eti-only)
- Check what type of instantiation gets hit in apps (ETI, Non-ETI, TPL)
In order to do all this we came up with a design which has 3 functionality layers (details later):
- User Interface: void foo(ViewType a, Scalar alpha): accepts views of all kinds and combinations; calls the specialization layer
- Specialization Layer: struct Foo { static void foo(ViewInternalType a, Scalar alpha); }; makes sure that only the minimally necessary number of instantiations exists, serves as ETI specialization layer, serves as TPL specialization layer
- Implementation Layer: This is called by the specialization layer, and has the actual functors etc.
Now I want to go through a couple of design aspects in the next posts.
src/blas:
KokkosBlas.hpp: includes all the KokkosBlas function header files
KokkosBlas1_foo.hpp (contains user interface functions for foo)
src/blas/impl:
KokkosBlas1_foo_impl.hpp: The actual implementation of the functions (Functors etc.)
KokkosBlas1_foo_spec.hpp: The specialization layer
src/impl/tpl
KokkosBlas1_foo_tpl_spec_avail.hpp: Availability of TPLs for particular types
KokkosBlas1_foo_tpl_spec_decl.hpp: The Specialization declaration for using tuples
src/impl/generated_specializations_hpp
KokkosBlas1_foo_eti_spec_avail.hpp: Availability declarations for ETI types. Generated during configure from a template .in file.
KokkosBlas1_foo_eti_spec_decl.hpp: Specialization declarations for ETI types. Generated during configure from a template .in file.
src/impl/generated_specializations_cpp/foo
KokkosBlas1_abs_eti_spec_inst.cpp.in: a template file that will be used to create a source file for each extern template instantiation enabled.
-
Add a new function:
- Add all those files based on the template provided
-
Modify the implementation of a function
- Only
src/{blas,sparse,graph,...}/impl/KokkosBlas1_foo_impl.hpp
needs to be modified
- Only
-
Add a new ETI type
- Modify
src/CMakeLists.txt
to generate the required header and source files from the ETI templates. For example:
- Modify
KOKKOSKERNELS_GENERATE_ETI(Blas1_dot_mv dot
HEADER_LIST HEADERS
SOURCE_LIST SOURCES
TYPE_LISTS FLOATS LAYOUTS DEVICES
)
The first argument is the full kernel name, which will result in files such as KokkosBlas1_dot_mv_eti_spec_avail.hpp
and KokkosBlas1_dot_mv_eti_spec_decl.hpp
. The second argument is the subfolder in src
and impl
where the template files exist. The KOKKOSKERNELS_GENERATE_ETI
function takes the TYPE_LISTS
keyword argument. These define the template arguments that will have ETI versions generated. DEVICES
provides two template arguments: an execution space and a corresponding valid memory space. The dot
function expects 4 ETI parameters. Other functions may require more. This should match the number of arguments to macros like KOKKOSBLAS1_IMPL_FOO_ETI_SPEC_AVAIL
(see below).
- Add a new TPL variant
- Modify the files in
impl/tpl/
to add the new TPL (declare its availability, and provide the implementation of how to call it)
- Modify the files in
This file provides the public API for the function foo
. The function internally calls the specialization layer after explicitly filling in all the necessary template arguments for the ViewTypes etc. For example for a dot(a,b)
product, const modifiers should be added to the scalar type, if they are not already there. Otherwise this would require to compile the code potentially 4 times:
dot(View<double*>, View<double*>);
dot(View<double*>, View<const double*>);
dot(View<const double*>, View<double*>);
-
dot(View<const double*>, View<const double*>);
If you then factor in explicit vs implicit specification of Layout, Memory Space, and MemoryTraits we end up with over 100 possible instantiations for something which is technically the exact same thing!
Furthermore this function should also do static asserts on things which are not allowed (for example wrong Rank of the view) in order to give users an early exit in a function which they can directly associate with the code they written.
Here is an example for:
// Include the specialziation layer which define the Impl::Foo struct
#include<impl/KokkosBlas1_foo_spec.hpp>
namespace KokkosBlas1 {
// User facing function accepts any ViewType
template<class ViewType>
void foo(const ViewType& a) {
// Static assert on prohibited types
static_assert(ViewType::rank==1, "Trying to call foo with View of rank other than 1");
// Convert ViewType to internal ViewType to reduce instantiations
// Without this wether you explicitly specify a Layout or not would be
// two different instantiations since Views have variadic template parameters
// Furthermore this is the place to add missing const etc.
typedef Kokkos::View<typename ViewType::data_type,
typename ViewType::array_layout,
typename ViewType::device_type>
ViewTypeInternal;
// Call the actual implementation
Impl::Foo<ViewTypeInternal>::foo(a);
}
}
This layer is the one which not only serves as the focal point for the unified instantiation of the things the public layer requires, it is also the layer which allows for specialization for third party libraries (such as MKL and CUBLAS) and explicit template instantiation (ETI).
Generally this layer is very thin again and basically just passes through arguments.
The basic mechanism for ETI is the extern template
mechanism of C++11. Unfortunately that thing has some funky semantics with respect to classes. In particular it looks like the compile can still choose to inline the implementation of the class, if it is visible in the same compilation unit instead of calling the externally available instantiation. This might also be compiler dependent.
To enable both TPL specialization and ETI specialization additional bool template parameters are added to the specialization layer which are defaulted to values based on whether said specializations are available:
From src/blas/impl/KokkosBlas1_foo_spec.hpp
:
template<class ViewType>
struct foo_eti_spec_avail {
enum : bool { value = false };
};
template<class ViewType, bool tpl_spec_avail = foo_tpl_spec_avail<ViewType>::value,
bool eti_spec_avail = foo_eti_spec_avail<ViewType>::value>
struct Foo {
static void foo(const ViewType& a);
};
In order to declare a specialization available a full specialization of foo_tpl_spec_avail
or foo_eti_spec_avail
must be made available. Those functions live in src/impl/tpls/KokkosBlas1_foo_tpl_spec_avail.hpp
and src/impl/generated_specializations_hpp/KokkosBlas1_foo_eti_spec_avail.hpp
respectively with the latter auto generated. We come back to those files in a bit.
The next part in the specialization layer is the definition of the specialization layer for when no TPL is used. This calls the actual implementation provided in src/blas/impl/KokkosBlas1_foo_impl.hpp
Note that the TPL bool is set to false, while the other one is set to KOKKOSKERNELS_IMPL_COMPILE_LIBRARY
. The latter one is only going to be true while compiling the KokkosKernels library with its explicit template instantiations.
template<class ViewType>
struct Foo<ViewType,false,KOKKOSKERNELS_IMPL_COMPILE_LIBRARY> {
static void foo(const ViewType& a) {
execute_foo(a);
}
};
In this file we also need to define the macros which are later used in the auto generated files:
// Availability Macro
#define KOKKOSBLAS1_IMPL_FOO_ETI_SPEC_AVAIL( SCALAR, LAYOUT, EXECSPACE, MEMSPACE ) \
template<> \
struct foo_eti_spec_avail<Kokkos::View<SCALAR*,LAYOUT,Kokkos::Device<EXECSPACE,MEMSPACE> > > { \
enum : bool { value = true }; \
};
// Declaration Macro
#define KOKKOSBLAS1_IMPL_FOO_ETI_SPEC_DECL( SCALAR, LAYOUT, EXECSPACE, MEMSPACE ) \
extern template struct Foo<Kokkos::View<SCALAR*,LAYOUT,Kokkos::Device<EXECSPACE,MEMSPACE>>,false,true>;
// Instantiation Macro
#define KOKKOSBLAS1_IMPL_FOO_ETI_SPEC_INST( SCALAR, LAYOUT, EXECSPACE, MEMSPACE ) \
template struct Foo<Kokkos::View<SCALAR*,LAYOUT,Kokkos::Device<EXECSPACE,MEMSPACE>>,false,true>;
// Include the actual declarations for tpls and eti
#if !KOKKOSKERNELS_IMPL_COMPILE_LIBRARY
#include<impl/tpls/foo_tpl_spec_decl.hpp>
#include<impl/generated_specializations_hpp/foo_eti_spec_decl.hpp>
#endif
Note how the actual declarations of those classes are only included when we are NOT compiling the library.
The implementation layer in src/blas/impl/KokkosBlas1_foo_impl.hpp
is pretty much whatever we need it to be. In this case its just a simple function:
template<class ViewType>
void execute_foo(const ViewType& a) {
Kokkos::parallel_for("KokkosBlas1::foo",a.extent(0), KOKKOS_LAMBDA (const int& i) {
a(i) = i;
});
}
If we want to distinguish between multi vector and normal vector where to put the stuff the implementation layer may be one of the places.
The TPL layer consists of two files: the one which declares the availability of a specialization and the one which provides the specialization. The first one is src/impl/tpls/KokkosBlas1_foo_tpl_spec_avail.hpp
:
template<class ViewType>
struct foo_tpl_spec_avail {
enum : bool { value = false };
};
#ifdef KOKKOSKERNELS_ENABLE_MKL
template<>
struct foo_tpl_spec_avail<Kokkos::View<double*,Kokkos::LayoutRight,Kokkos::Device<Kokkos::Serial,Kokkos::HostSpace>>> {
enum : bool { value = true };
};
#endif
Basically for every new TPL which we want to support we drop another full specialization of this stuff in.
The implementation is the counter part to it. Note that we can use the implementation to decide based on input parameters whether to call our own code or the tpl code. We also need to have two full specializations here based on whether ETI for the same type combination would be available or not.
#ifdef KOKKOSKERNELS_ENABLE_MKL
#include<mkl_foo.hpp>
namespace KokkosBlas1 {
namespace Impl {
// Only a TPL specialization is available
template<>
struct Foo<Kokkos::View<double*,Kokkos::LayoutRight,Kokkos::Device<Kokkos::Serial,Kokkos::HostSpace>>,true,false> {
typedef Kokkos::View<double*,Kokkos::LayoutRight,Kokkos::Device<Kokkos::Serial,Kokkos::HostSpace>> ViewType;
static void foo(const ViewType& a) {
#if (KOKKOSKERNELS_ENABLE_CHECK_SPECIALIZATION)
printf("Calling MKL Specialization\n");
#endif
mkl_foo(a.data(),a.extent(0));
}
};
// Both a TPL specialization and an ETI instantiation are available
template<>
struct Foo<Kokkos::View<double*,Kokkos::LayoutRight,Kokkos::Device<Kokkos::Serial,Kokkos::HostSpace>>,true,true> {
typedef Kokkos::View<double*,Kokkos::LayoutRight,Kokkos::Device<Kokkos::Serial,Kokkos::HostSpace>> ViewType;
static void foo(const ViewType& a) {
// Our code is better for large number of entries, so only use TPL for small lengths
if(a.extent(0) < 100000)
Foo<ViewType,true,false>::foo(a);
else
Foo<ViewType,false,true>::foo(a);
}
};
}
}
#endif
Last but not least there are three auto generated files which are kind of like the TPL files: declare a ETI specialization available, provide the extern template
declaration of those ETI specializations, and instantiate them in cpp files. Those simply use the previously defined macros with the right type combinations.
There is one more detail using two additional macros:
-
KOKKOSKERNELS_ENABLE_ETI_ONLY
: is used to prevent instantiations of Non-ETI or Non-TPL types. This is used to hide the actual definition of the specialization layer when not compiling the library cpp files. -
KOKKOSKERNELS_ENABLE_CHECK_SPECIALIZATION
: this is more of a debug option which enables print statements stating which specialization (ETI, Non-ETI, TPL) was called. This is useful to make sure we don't instantiate stuff in cases where we can't turn on fullETI_ONLY
.
Also one more word to KOKKOSKERNELS_IMPL_COMPILE_LIBRARY
. This macro is always defined as false, except inside the auto generated ETI cpp files.
TODO: Genereation scripts for blas, sparse and graph. Blas --> depends on only scalar_t sparse --> scalar_t, ordinal_t, offset_t graph --> ordinal_t, offset_t