This repository was archived by the owner on Apr 15, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathKokkos_PG_Initialization.tex
160 lines (140 loc) · 7.77 KB
/
Kokkos_PG_Initialization.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
\chapter{Initialization}\label{C:init}
In order to use Kokkos an initialization call is required.
That call is responsible for aquiring hardware resources such as threads.
Typically this call should be placed right at the start of a program.
If you use both MPI and Kokkos, your program should initialize Kokkos right after calling \lstinline!MPI_Init!.
That way, if MPI sets up process binding masks,
Kokkos will get that information and use it for best performance.
Your program must also \emph{finalize} Kokkos when done using it,
in order to free hardware resources.
\section{Initialization by command-line arguments}\label{S:init:cmdline}
The simplest way to initialize Kokkos is by calling the following function:
\begin{lstlisting}
Kokkos::initialize(int& argc, char* argv[]);
\end{lstlisting}
Just like \lstinline!MPI_Init!, this function interprets command-line arguments to determine the requested settings.
Also like \lstinline!MPI_Init!, it reserves the right to remove command-line arguments from the input list.
This is why it takes \lstinline!argc! by reference, rather than by value; it may change the value on output.
This function will initialize the default execution space
\begin{lstlisting}
Kokkos::DefaultExecutionSpace;
\end{lstlisting}
and its default host execution space
\begin{lstlisting}
Kokkos::DefaultHostExecutionSpace;
\end{lstlisting}
if applicable.
These defaults depend on the Kokkos configuration.
Kokkos chooses the two spaces using the following list, ordered from low to high:
\begin{enumerate}
\item \lstinline|Kokkos::Serial|
\item \lstinline|Kokkos::Threads|
\item \lstinline|Kokkos::OpenMP|
\item \lstinline|Kokkos::Cuda|
\end{enumerate}
The highest execution space in the list which is actually enabled is Kokkos' default execution space,
and the highest enabled host execution space is Kokkos' default host execution space.
(Currently, the only non-host execution space is \lstinline!Cuda!.)
For example, if \lstinline|Kokkos::Cuda|, \lstinline|Kokkos::OpenMP|, and
\lstinline|Kokkos::Serial| are enabled, then \lstinline|Kokkos::Cuda| is the
default execution space and \lstinline|Kokkos::OpenMP| is the default host execution
space.\footnote{This is the preferred set of defaults when CUDA and OpenMP are enabled.
If you use a thread-parallel host execution space, we prefer Kokkos' OpenMP back-end,
as this ensures compatibility of Kokkos' threads with the application's direct use of OpenMP threads.
Kokkos cannot promise that its Threads back-end will not conflict with the application's direct use of operating system threads.}
Command-line arguments come in ``prefixed'' and ``non-prefixed'' versions.
Prefixed versions start with the string \verb!--kokkos-!.
\lstinline!Kokkos::initialize! will remove prefixed options from the input list,
but will preserve non-prefixed options.
Argument options are given with an equals (\verb!=!) sign.
If the same argument occurs more than once, the last one counts.
Furthermore, prefixed versions of the command line arguments take precedence over the non-prefixed ones.
For example, the arguments
\begin{verbatim}
--kokkos-threads=4 --threads=2
\end{verbatim}
set the number of threads to 4, while
\begin{verbatim}
--kokkos-threads=4 --threads=2 --kokkos-threads=3
\end{verbatim}
set the number of threads to 3.
Table \ref{TBL:CommandLineOptions} gives a full list of command-line options.
\begin{table}
\caption{Command-line options for \lstinline|Kokkos::initialize|}
\label{TBL:CommandLineOptions}
\begin{small}
\begin{tabular}[t]{lp{0.5\textwidth}}
\hline
Argument & Description \\\hline
\lstinline|--kokkos-help| & print this message \\
\lstinline|--kokkos-threads=INT| &
specify total number of threads or number of threads per NUMA region if used in conjunction with '--numa' option. \\
\lstinline|--kokkos-numa=INT| & specify number of NUMA regions used by process. \\
\lstinline|--kokkos-device=INT| & specify device id to be used by Kokkos. \\
\lstinline|--kokkos-ndevices=INT[,INT]| & used when running MPI jobs. Specify number of
devices per node to be used. Process to device
mapping happens by obtaining the local MPI rank
and assigning devices round-robin. The optional
second argument allows for an existing device
to be ignored. This is most useful on workstations
with multiple GPUs, of which one is used to drive
screen output.\\
\hline
\end{tabular}
\end{small}
\end{table}
\section{Initialization by struct}\label{S:init:struct}
Instead of giving \lstinline|Kokkos::initialize()| command-line arguments,
one may directly pass in initialization parameters, using the following struct:
\begin{lstlisting}
struct Kokkos::InitArguments {
int num_threads;
int num_numa;
int device_id;
// ... the struct may have more members ...
};
\end{lstlisting}
The \lstinline|num_threads| field corresponds to the \verb!--kokkos-threads! command-line argument,
\lstinline|num_numa| to \verb!--kokkos-numa!, and \lstinline|device_id| to \verb!--kokkos-device!.
(See Table \ref{TBL:CommandLineOptions} for details.)
Not all parameters are observed by all execution spaces, and the struct might expand in the future if needed.
If you set \lstinline|num_threads| or \lstinline|num_numa| to zero or less,
Kokkos will try to determine default values if possible or otherwise set them to 1.
In particular Kokkos can use the hwloc library to determine default settings, using the assumption that the process binding mask is unique, i.e. that this process does not share any cores with another process.
Note that the default value of each parameter is -1.
Here is an example of how to use the struct.
\begin{lstlisting}
Kokkos::InitArguments args;
// 8 (CPU) threads per NUMA region
args.num_threads = 8;
// 2 (CPU) NUMA regions per process
args.num_numa = 2;
// If Kokkos was built with CUDA enabled,
// use the GPU with device ID 1.
args.device_id = 1;
Kokkos::initialize(args);
\end{lstlisting}
\section{Initializing non-default execution spaces}\label{S:init:specific}
Instead of calling the generic initialization, one can call initialization for each execution space on its own.
If the associated host execution space of an execution space is not identical to the latter, it has to be initialized first.
For example when compiling with support for pthreads and Cuda, \lstinline|Kokkos::Threads| has to be initialized before \lstinline|Kokkos::Cuda|.
The initialization calls take different arguments for each of the execution spaces.
If you want to initialize an execution space other than those that Kokkos initializes by default,
you \emph{must} initialize it explicitly in code, by calling its \lstinline!initialize()! method and passing it the appropriate arguments.
Furthermore, if the associated host execution space of an execution space is not identical to the latter, the host execution space must be initialized first.
For example, if you have set the default execution space to \lstinline!Kokkos::Serial!,
but want to use \lstinline!Kokkos::Cuda! and \lstinline!Kokkos::OpenMP!,
you must initialize \lstinline!Kokkos::OpenMP! before \lstinline!Kokkos::Cuda!.
The initialization calls take different arguments for each of the execution spaces.
We do \emph{not} recommend initializing multiple host execution spaces,
because the different spaces may fight over threads or other hardware resources.
This may have a big effect on performance.
The only exception is the \lstinline!Serial! execution space.
You may safely initialize \lstinline!Serial! along with another host execution space.
\section{Finalization}\label{S:init:finalize}
At the end of each program Kokkos needs to be shut down in order to free resources.
Do this by calling \lstinline!Kokkos::finalize()!.
You may wish to set this to be called automatically at program exit,
either by setting an \lstinline!atexit! hook
or by attaching the function to \lstinline!MPI_COMM_SELF!,
so that it is called automatically at \lstinline!MPI_Finalize!.