diff --git a/docs/how-to/hip_runtime_api/asynchronous.rst b/docs/how-to/hip_runtime_api/asynchronous.rst index 4e6b8f5678..f55b0adca0 100644 --- a/docs/how-to/hip_runtime_api/asynchronous.rst +++ b/docs/how-to/hip_runtime_api/asynchronous.rst @@ -29,17 +29,33 @@ Managing streams ------------------------------------------------------------------------------- Streams enable the overlap of computation and data transfer, ensuring -continuous GPU activity. To create a stream, the :cpp:func:`hipStreamCreate`, -:cpp:func:`hipStreamCreateWithFlags` and :cpp:func:`hipStreamCreateWithPriority` -functions are used, returning a handle to the newly created stream. Assigning -different operations to different streams allows multiple tasks to run -simultaneously, improving overall performance. - -The :cpp:func:`hipStreamSynchronize` function in the HIP API is used to block -the calling host thread until all previously submitted tasks in a specified HIP -stream have completed. It ensures that all operations in the given stream, such -as kernel executions or memory transfers, are finished before the host thread -proceeds. +continuous GPU activity. + +To create a stream, the following functions are used, each returning a handle +to the newly created stream: + +- :cpp:func:`hipStreamCreate`: Creates a stream with default settings. +- :cpp:func:`hipStreamCreateWithFlags`: Allows creating a stream, with specific + flags, listed below, enabling more control over stream behavior: + + - ``hipStreamDefault``: creates a default stream suitable for most + operations. It ensures that the stream is not non-blocking. + - ``hipStreamNonBlocking``: creates a non-blocking stream, allowing + concurrent execution of operations. It ensures that tasks can run + simultaneously without waiting for each other to complete, thus improving + overall performance. + +- :cpp:func:`hipStreamCreateWithPriority``: Allows creating a stream with a + specified priority, enabling prioritization of certain tasks. + +Assigning different operations to different streams allows multiple tasks to +run simultaneously, improving overall +:ref:`performance`. + +The :cpp:func:`hipStreamSynchronize` function is used to block the calling host +thread until all previously submitted tasks in a specified HIP stream have +completed. It ensures that all operations in the given stream, such as kernel +executions or memory transfers, are finished before the host thread proceeds. .. note:: @@ -74,6 +90,15 @@ can be misleading at concurrent kernel runs, that's why during optimization it's better to check the trace files, to see if a kernel is blocking another kernel, while they are running parallel. +When running kernels in parallel, the execution time can increase due to +contention for shared resources. This is because multiple kernels may attempt +to access the same GPU resources simultaneously, leading to delays. + +Asynchronous kernel execution is beneficial only under specific conditions. +It is most effective when the kernels do not fully utilize the GPU's resources. +In such cases, overlapping kernel execution can improve overall throughput and +efficiency by keeping the GPU busy without exceeding its capacity. + Overlap of data transfer and kernel execution =============================================================================== @@ -81,6 +106,11 @@ One of the primary benefits of asynchronous operations is the ability to overlap data transfer with kernel execution, leading to better resource utilization and improved performance. +Asynchronous execution is particularly advantageous in iterative processes. For +instance, if an iteration calculation is initiated, it can be efficient to +prepare the input data simultaneously, provided that this preparation does not +depend on the kernel's execution. + Querying device capabilities ------------------------------------------------------------------------------- diff --git a/docs/how-to/performance_guidelines.rst b/docs/how-to/performance_guidelines.rst index bf74b63d16..9b62e4854e 100644 --- a/docs/how-to/performance_guidelines.rst +++ b/docs/how-to/performance_guidelines.rst @@ -3,6 +3,8 @@ developers optimize the performance of HIP-capable GPU architectures. :keywords: AMD, ROCm, HIP, CUDA, performance, guidelines +.. _how_to_performance_guidelines: + ******************************************************************************* Performance guidelines *******************************************************************************