Skip to content

Commit

Permalink
WIP
Browse files Browse the repository at this point in the history
  • Loading branch information
matyas-streamhpc committed Dec 2, 2024
1 parent 950d61c commit b497a18
Show file tree
Hide file tree
Showing 2 changed files with 43 additions and 11 deletions.
52 changes: 41 additions & 11 deletions docs/how-to/hip_runtime_api/asynchronous.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,17 +29,33 @@ Managing streams
-------------------------------------------------------------------------------

Streams enable the overlap of computation and data transfer, ensuring
continuous GPU activity. To create a stream, the :cpp:func:`hipStreamCreate`,
:cpp:func:`hipStreamCreateWithFlags` and :cpp:func:`hipStreamCreateWithPriority`
functions are used, returning a handle to the newly created stream. Assigning
different operations to different streams allows multiple tasks to run
simultaneously, improving overall performance.

The :cpp:func:`hipStreamSynchronize` function in the HIP API is used to block
the calling host thread until all previously submitted tasks in a specified HIP
stream have completed. It ensures that all operations in the given stream, such
as kernel executions or memory transfers, are finished before the host thread
proceeds.
continuous GPU activity.

To create a stream, the following functions are used, each returning a handle
to the newly created stream:

- :cpp:func:`hipStreamCreate`: Creates a stream with default settings.
- :cpp:func:`hipStreamCreateWithFlags`: Allows creating a stream, with specific
flags, listed below, enabling more control over stream behavior:

- ``hipStreamDefault``: creates a default stream suitable for most
operations. It ensures that the stream is not non-blocking.
- ``hipStreamNonBlocking``: creates a non-blocking stream, allowing
concurrent execution of operations. It ensures that tasks can run
simultaneously without waiting for each other to complete, thus improving
overall performance.

- :cpp:func:`hipStreamCreateWithPriority``: Allows creating a stream with a
specified priority, enabling prioritization of certain tasks.

Assigning different operations to different streams allows multiple tasks to
run simultaneously, improving overall
:ref:`performance<how_to_performance_guidelines>`.

The :cpp:func:`hipStreamSynchronize` function is used to block the calling host
thread until all previously submitted tasks in a specified HIP stream have
completed. It ensures that all operations in the given stream, such as kernel
executions or memory transfers, are finished before the host thread proceeds.

.. note::

Expand Down Expand Up @@ -74,13 +90,27 @@ can be misleading at concurrent kernel runs, that's why during optimization
it's better to check the trace files, to see if a kernel is blocking another
kernel, while they are running parallel.

When running kernels in parallel, the execution time can increase due to
contention for shared resources. This is because multiple kernels may attempt
to access the same GPU resources simultaneously, leading to delays.

Asynchronous kernel execution is beneficial only under specific conditions.
It is most effective when the kernels do not fully utilize the GPU's resources.
In such cases, overlapping kernel execution can improve overall throughput and
efficiency by keeping the GPU busy without exceeding its capacity.

Overlap of data transfer and kernel execution
===============================================================================

One of the primary benefits of asynchronous operations is the ability to
overlap data transfer with kernel execution, leading to better resource
utilization and improved performance.

Asynchronous execution is particularly advantageous in iterative processes. For
instance, if an iteration calculation is initiated, it can be efficient to
prepare the input data simultaneously, provided that this preparation does not
depend on the kernel's execution.

Querying device capabilities
-------------------------------------------------------------------------------

Expand Down
2 changes: 2 additions & 0 deletions docs/how-to/performance_guidelines.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
developers optimize the performance of HIP-capable GPU architectures.
:keywords: AMD, ROCm, HIP, CUDA, performance, guidelines

.. _how_to_performance_guidelines:

*******************************************************************************
Performance guidelines
*******************************************************************************
Expand Down

0 comments on commit b497a18

Please sign in to comment.