-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add asynchronous concurrent execution #3687
base: docs/develop
Are you sure you want to change the base?
Conversation
1484d67
to
f81588d
Compare
WIP WIP WIP WIP WIP
a8fd499
to
fd5af51
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left comments. Looks good overall.
Asynchronous concurrent execution | ||
******************************************************************************* | ||
|
||
Asynchronous concurrent execution important for efficient parallelism and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asynchronous concurrent execution important for efficient parallelism and | |
Asynchronous concurrent execution is important for efficient parallelism and |
Asynchronous concurrent execution important for efficient parallelism and | ||
resource utilization, with techniques such as overlapping computation and data | ||
transfer, managing concurrent kernel execution with streams on single or | ||
multiple devices or using HIP graphs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
multiple devices or using HIP graphs. | |
multiple devices, or using HIP graphs. |
data allocation/freeing all happen in the context of device streams. | ||
|
||
Streams are FIFO buffers of commands to execute in order on a given device. | ||
Commands which enqueue tasks on a stream all return promptly and the command is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commands which enqueue tasks on a stream all return promptly and the command is | |
Commands which enqueue tasks on a stream all return promptly and the task is |
|
||
Streams are FIFO buffers of commands to execute in order on a given device. | ||
Commands which enqueue tasks on a stream all return promptly and the command is | ||
executed asynchronously. Multiple streams may point to the same device and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
executed asynchronously. Multiple streams may point to the same device and | |
executed asynchronously. Multiple streams can point to the same device and |
Commands which enqueue tasks on a stream all return promptly and the command is | ||
executed asynchronously. Multiple streams may point to the same device and | ||
those streams may be fed from multiple concurrent host-side threads. Execution | ||
on multiple streams may be concurrent but isn't required to be. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on multiple streams may be concurrent but isn't required to be. | |
on multiple streams might be concurrent but isn't required to be. |
contention for shared resources. This is because multiple kernels may attempt | ||
to access the same GPU resources simultaneously, leading to delays. | ||
|
||
Asynchronous kernel execution is beneficial only under specific conditions It |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asynchronous kernel execution is beneficial only under specific conditions It | |
Asynchronous kernel execution is beneficial only under specific conditions. It |
or from the GPU concurrently with kernel execution. Applications can query this | ||
capability by checking the ``asyncEngineCount`` device property. Devices with | ||
an ``asyncEngineCount`` greater than zero support concurrent data transfers. | ||
Additionally, if host memory is involved in the copy, it should be page-locked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reference we can provide such as Memory Management or something?
|
||
It is also possible to perform intra-device copies simultaneously with kernel | ||
execution on devices that support the ``concurrentKernels`` device property | ||
and/or with copies to or from the device (for devices that support the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and/or with copies to or from the device (for devices that support the | |
and/or with copies to or from the device (for devices that support the |
Are copies to or from the device intra-device copies?
called, control is not returned to the host thread before the device has | ||
completed the requested task. The behavior of the host thread—whether to yield, | ||
block, or spin—can be specified using :cpp:func:`hipSetDeviceFlags` with | ||
specific flags. Understanding when to use synchronous calls is important for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specific flags. Understanding when to use synchronous calls is important for | |
appropriate flags. Understanding when to use synchronous calls is important for |
By creating an event with :cpp:func:`hipEventCreate` and recording it with | ||
:cpp:func:`hipEventRecord`, developers can synchronize operations across | ||
streams, ensuring correct task execution order. :cpp:func:`hipEventSynchronize` | ||
allows waiting for an event to complete before proceeding with the next |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allows waiting for an event to complete before proceeding with the next | |
lets the application wait for an event to complete before proceeding with the next |
sequences of kernels and memory operations as a single graph, they simplify | ||
complex workflows and enhance performance, particularly for applications with | ||
intricate dependencies and multiple execution stages. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No description provided.