Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Graph] Add support for host task #344

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 58 additions & 0 deletions sycl/doc/design/CommandGraph.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,64 @@ in a command-graph, it will perform a blocking wait on the dependencies of the
command-group first. The user will experience this wait as part of graph
finalization.

## Graph Partitioning

To handle dependencies from other devices, the graph can be partitioned during
the finalization process. A partition is a set of one or more nodes intended
to run on the same device. Each partition instantiates a command-buffer
(or equivalent) which contains all the commands to be executed on the device.
Therefore, the partitioning only impacts graphs in the executable state and
occurs during finalization. Synchronization between partitions is managed
by the runtime unlike internal partition dependencies that are handled directly
by the backend.
mfrancepillois marked this conversation as resolved.
Show resolved Hide resolved

Since runtime synchronization and multiple command-buffer involves
extra latency, the implementation ensures to minimize the number of partitions.
Currently, the creation of a new partition is triggered by a node containing
a host-task.
When a host-task is encountered the predecessors of this host-task node
are assigned to one partition, the host-task is assigned to another partition,
and the successors are assigned to a third partition as shown below:

![Graph partition illustration.](images/SYCL-Graph-partitions.jpg)

Partition numbers are allocated in order. Hence, the runtime must ensure that
Partition `n` complete before starting execution of Partition `n+1`.

Note that partitioning can only happen during the finalization stage due to
potential backward dependencies that could be created using
the `make_edge` function.

EwanC marked this conversation as resolved.
Show resolved Hide resolved
### Example
The partitioning process is achieved is two main stages:

1 - Nodes are assigned to a temporary group/partition.

2 - Once all the nodes have been annotated with a group number,
actual partitions are created based on these annotations.

The following diagrams show the annotation process:

![Graph partition illustration step 1.](images/SYCL-Graph-partitions_step1.jpg)
mfrancepillois marked this conversation as resolved.
Show resolved Hide resolved
![Graph partition illustration step 2.](images/SYCL-Graph-partitions_step2.jpg)
![Graph partition illustration step 3.](images/SYCL-Graph-partitions_step3.jpg)
![Graph partition illustration step 4.](images/SYCL-Graph-partitions_step4.jpg)
![Graph partition illustration step 5.](images/SYCL-Graph-partitions_step5.jpg)
![Graph partition illustration step 6.](images/SYCL-Graph-partitions_step6.jpg)

Now consider a slightly different graph.
We used the `make_edge` function to create a dependency between Node E and
Node HT1. The first 5 steps are identical.
However, from the step 6 the process changes and a group merge is needed as
illustrated in the following diagrams:

![Graph partition illustration step 6b.](images/SYCL-Graph-partitions_step7.jpg)
![Graph partition illustration step 7b.](images/SYCL-Graph-partitions_step8.jpg)
![Graph partition illustration step 8b.](images/SYCL-Graph-partitions_step9.jpg)
![Graph partition illustration step 9b.](images/SYCL-Graph-partitions_step10.jpg)
![Graph partition illustration step 10b.](images/SYCL-Graph-partitions_step11.jpg)
![Graph partition illustration step 11b.](images/SYCL-Graph-partitions_step12.jpg)

## Memory handling: Buffer and Accessor

There is no extra support for graph-specific USM allocations in the current
Expand Down
Binary file added sycl/doc/design/images/SYCL-Graph-partitions.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
40 changes: 15 additions & 25 deletions sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -1111,10 +1111,21 @@ modifiable graph will perform this action, useful in RAII pattern usage.

:host-task: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#subsec:interfaces.hosttasks

It is not yet supported to have a host task inside a `command_graph`, and an
exception will be thrown if used by application code. Support will be added
subsequently as detailed in the <<future-host-tasks, host tasks>> part from the
<<future-direction, future direction>> section of this specification.
A {host-task}[host task] is a native C++ callable, scheduled according to SYCL
dependency rules. It is valid to record a host task as part of a graph, though it
may lead to sub-optimal graph performance because a host task node may prevent
the SYCL runtime from submitting the entire executable `command_graph` to the
device at once.

[source,c++]
----
auto node = graph.add([&](sycl::handler& cgh){
// Host code here is evaluated during the call to add()
cgh.host_task([=](){
// Code here is evaluated as part of executing the command graph node
});
});
----

=== Queue Behavior In Recording Mode

Expand Down Expand Up @@ -1570,27 +1581,6 @@ if all the commands accessing this buffer use `access_mode::write` or the
Note, however, that these cases require the application to disable copy-back
as described in <<buffer-limitations, Buffer Limitations>>.

==== Host Tasks [[future-host-tasks]]

A {host-task}[host task] is a native C++ callable, scheduled according to SYCL
dependency rules. It is valid to record a host task as part of graph, though it
may lead to sub-optimal graph performance because a host task node may prevent
the SYCL runtime from submitting the entire executable `command_graph` to the
device at once.

Host tasks can be updated as part of <<executable-graph-update, executable graph update>>
by replacing the whole node with the new callable.

[source,c++]
----
auto node = graph.add([&](sycl::handler& cgh){
// Host code here is evaluated during the call to add()
cgh.host_task([=](){
// Code here is evaluated as part of executing the command graph node
});
});
----

==== Executable Graph Update

A graph in the executable state can have each nodes inputs & outputs updated
Expand Down
Loading
Loading