Skip to content

Commit

Permalink
Add documentation for SequentialHostInit (#607)
Browse files Browse the repository at this point in the history
* Add documentation for SequentialHostInit

* Update docs/source/API/core/view/view_alloc.rst

* Add restrictions for SequentialHostInit and ProgrammingGuide update

* Update docs/source/API/core/view/view_alloc.rst

Co-authored-by: Maarten Arnst <[email protected]>

* Update docs/source/API/core/view/view_alloc.rst

Co-authored-by: Maarten Arnst <[email protected]>

---------

Co-authored-by: Thomas Padioleau <[email protected]>
Co-authored-by: Maarten Arnst <[email protected]>
  • Loading branch information
3 people authored Dec 2, 2024
1 parent 132820a commit c7595a9
Show file tree
Hide file tree
Showing 3 changed files with 92 additions and 16 deletions.
4 changes: 4 additions & 0 deletions docs/source/API/alphabetical.rst
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,8 @@ Core
+--------------------------------------------------------------------------------------+---------------------------+------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+
| `Serial <core/execution_spaces.html#kokkos-serial>`_ | `Core <core-index.html>`_ | `Spaces <core/Spaces.html>`_ | Execution space using serial execution the CPU. |
+--------------------------------------------------------------------------------------+---------------------------+------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+
| `SequentialHostInit <core/view/view_alloc.html>`_ | `Core <core-index.html>`_ | `View and related <core/View.html>`_ | An option used with `view_alloc <core/view/view_alloc.html>`_ |
+--------------------------------------------------------------------------------------+---------------------------+------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+
| `ScopeGuard <core/initialize_finalize/ScopeGuard.html>`_ | `Core <core-index.html>`_ | `Initialization and Finalization <core/Initialize-and-Finalize.html>`_ | class to aggregate initializing and finalizing Kokkos |
+--------------------------------------------------------------------------------------+---------------------------+------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+
| `SpaceAccessibility <core/SpaceAccessibility.html>`_ | `Core <core-index.html>`_ | `Spaces <core/Spaces.html>`_ | Facility to query accessibility rules between execution and memory spaces. |
Expand Down Expand Up @@ -232,3 +234,5 @@ Core
+--------------------------------------------------------------------------------------+---------------------------+------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+
| `View-like Type Concept <core/view/view_like.html>`_ | `Core <core-index.html>`_ | `View and related <core/View.html>`_ | A set of class templates that act like a View |
+--------------------------------------------------------------------------------------+---------------------------+------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+
| `WithoutInitializing <core/view/view_alloc.html>`_ | `Core <core-index.html>`_ | `View and related <core/View.html>`_ | An option used with `view_alloc <core/view/view_alloc.html>`_ |
+--------------------------------------------------------------------------------------+---------------------------+------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+
30 changes: 29 additions & 1 deletion docs/source/API/core/view/view_alloc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,9 @@ Create View allocation parameter bundle from argument list. Valid argument list

* execution space instance able to access ``View::memory_space``

* ``Kokkos::WithoutInitializing`` to bypass initialization
* ``Kokkos::WithoutInitializing`` to bypass element initialization and destruction

* ``Kokkos::SequentialHostInit`` to perform element initialization and destruction serially on host (since 4.4.01)

* ``Kokkos::AllowPadding`` to allow allocation to pad dimensions for memory alignment

Expand All @@ -44,8 +46,34 @@ Description

``args`` : Can only be a pointer to memory.


.. cppkokkos:type:: ALLOC_PROP
:cppkokkos:type:`ALLOC_PROP` is a special, unspellable implementation-defined type that is returned by :cppkokkos:func:`view_alloc`
and :cppkokkos:func:`view_wrap`. It represents a bundle of allocator parameters, including the View label, the memory space instance,
the execution space instance, whether to initialize the memory, whether to allow padding, and the raw pointer value (for wrapped unmanaged views).

.. cppkokkos:type:: WithoutInitializing
:cppkokkos:type:`WithoutInitializing` is intended to be used in situations where default construction of `View` elements in its
associated execution space is not needed or not viable. In particular, it may not be viable in situations such as the construction of objects with virtual functions,
or for `Views` of elements without default constructor. In such situations, this option is often used in conjunction with manual in-place `new`
construction of objects and manual destruction of elements.

.. cppkokkos:type:: SequentialHostInit
:cppkokkos:type:`SequentialHostInit` is intended to be used to initialize elements that do not have a default constructor or destructor that
can be called inside a Kokkos parallel region. In particular this includes constructors and destructors which:

* allocate or deallocate memory
* create or destroy managed `Kokkos::View` objects
* call Kokkos parallel operations

When using this allocation option the `View` constructor/destructor will create/destroy elements in a serial loop on the Host.

.. warning::

`SequentialHostInit` can only be used when creating host accessible `View`s, such as `View`s with `HostSpace`, `SharedSpace`,
or `SharedHostPinnedSpace` as memory space.

.. versionadded:: 4.4.01
74 changes: 59 additions & 15 deletions docs/source/ProgrammingGuide/View.rst
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,57 @@ Another issue is that View construction in a Kokkos parallel region does not upd

Here is how to create a View of Views, where each inner View has a separate owning allocation:

1. The outer View must have a memory space that is both host and device accessible, such as `CudaUVMSpace`.
1. The outer View must have a memory space that is both host and device accessible, such as :cppkokkos:type:`SharedSpace`.
2. Create the outer View using the :cppkokkos:type:`SequentialHostInit` property.
3. Create inner Views in a sequential host loop. (Prefer creating the inner Views uninitialized. Creating the inner Views initialized launches one device kernel per inner View. This is likely much slower than just initializing them all yourself from a single kernel over the outer View.)
4. At this point, you may access the outer and inner Views on device.
5. Get rid of the outer View as you normally would.

Here is an example:

.. code-block:: c++

using Kokkos::SharedSapce;
using Kokkos::View;
using Kokkos::view_alloc;
using Kokkos::SequentialHostInit;
using Kokkos::WithoutInitializing;

using inner_view_type = View<double*>;
using outer_view_type = View<inner_view_type*, SharedSpace>;

const int numOuter = 5;
const int numInner = 4;
outer_view_type outer (view_alloc (std::string ("Outer"), SequentialHostInit), numOuter);

// Create inner Views on host, outside of a parallel region, uninitialized
for (int k = 0; k < numOuter; ++k) {
const std::string label = std::string ("Inner ") + std::to_string (k);
outer(k) = inner_view_type (view_alloc (label, WithoutInitializing), numInner);
}

// Outer and inner views are now ready for use on device

Kokkos::RangePolicy<> range (0, numOuter);
Kokkos::parallel_for ("my kernel label", range,
KOKKOS_LAMBDA (const int i) {
for (int j = 0; j < numInner; ++j) {
device_outer(i)(j) = 10.0 * double (i) + double (j);
}
}
});
Kokkos::fence();

// Destroy the View of Views - this will call destructors sequentially on the host!
outer = outer_view_type ();

Another approach is to create the inner Views as nonowning, from a single pool of memory. This makes it unnecessary to invoke their destructors.

.. warning::

`SequentialHostInit` was added in version 4.4.01. Prior to that the process was more involved.

1. The outer View must have a memory space that is both host and device accessible, such as `SharedSpace`.
2. Create the outer View without initializing it.
3. Create inner Views using placement new, in a sequential host loop. (Prefer creating the inner Views uninitialized. Creating the inner Views initialized launches one device kernel per inner View. This is likely much slower than just initializing them all yourself from a single kernel over the outer View.)
4. At this point, you may access the outer and inner Views on device.
Expand All @@ -157,15 +207,13 @@ Here is an example:

.. code-block:: c++

using Kokkos::Cuda;
using Kokkos::CudaSpace;
using Kokkos::CudaUVMSpace;
using Kokkos::SharedSpace;
using Kokkos::View;
using Kokkos::view_alloc;
using Kokkos::WithoutInitializing;

using inner_view_type = View<double*, CudaSpace>;
using outer_view_type = View<inner_view_type*, CudaUVMSpace>;
using inner_view_type = View<double*>;
using outer_view_type = View<inner_view_type*, SharedSpace>;

const int numOuter = 5;
const int numInner = 4;
Expand All @@ -174,36 +222,32 @@ Here is an example:
// Create inner Views on host, outside of a parallel region, uninitialized
for (int k = 0; k < numOuter; ++k) {
const std::string label = std::string ("Inner ") + std::to_string (k);
new (&outer[k]) inner_view_type (view_alloc (label, WithoutInitializing), numInner);
new (&outer(k)) inner_view_type (view_alloc (label, WithoutInitializing), numInner);
}

// Outer and inner views are now ready for use on device

Kokkos::RangePolicy<Cuda, int> range (0, numOuter);
Kokkos::RangePolicy<> range (0, numOuter);
Kokkos::parallel_for ("my kernel label", range,
KOKKOS_LAMBDA (const int i) {
for (int j = 0; j < numInner; ++j) {
device_outer[i][j] = 10.0 * double (i) + double (j);
device_outer(i)(j) = 10.0 * double (i) + double (j);
}
}
});

// Fence before deallocation on host, to make sure
// that the device kernel is done first.
// Note the new fence syntax that requires an instance.
// This will work with other CUDA streams, etc.
Cuda ().fence ();
Kokkos::fence ();

// Destroy inner Views, again on host, outside of a parallel region.
for (int k = 0; k < 5; ++k) {
outer[k].~inner_view_type ();
outer(k).~inner_view_type ();
}

// You're better off disposing of outer immediately.
outer = outer_view_type ();

Another approach is to create the inner Views as nonowning, from a single pool of memory. This makes it unnecessary to invoke their destructors.

6.2.4 Const Views
~~~~~~~~~~~~~~~~~

Expand Down

0 comments on commit c7595a9

Please sign in to comment.