Skip to content

Commit

Permalink
[SYCL][Graph] Add support for host task
Browse files Browse the repository at this point in the history
Adds graph partitioning process to handle host task dependencies.
Adds e2e tests.
Updates design doc.
  • Loading branch information
mfrancepillois committed Dec 4, 2023
1 parent 88f1d0a commit dc5a2b8
Show file tree
Hide file tree
Showing 41 changed files with 1,383 additions and 171 deletions.
58 changes: 58 additions & 0 deletions sycl/doc/design/CommandGraph.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,64 @@ in a command-graph, it will perform a blocking wait on the dependencies of the
command-group first. The user will experience this wait as part of graph
finalization.

## Graph Partitioning

To handle dependencies from other devices, the graph can be partitioned during
the finalization process. A partition is a set of one or more nodes intended
to run on the same device. Each partition instantiates a command-buffer
(or equivalent) which contains all the commands to be executed on the device.
Therefore, the partitioning only impacts graphs in the executable state and
occurs during finalization. Synchronization between partitions is managed
by the runtime unlike internal partition dependencies that are handled directly
by the backend.

Since runtime synchronization and multiple command-buffer involves
extra latency, the implementation ensures to minimize the number of partitions.
Currently, the creation of a new partition is triggered by a node containing
a host-task.
When a host-task is encountered the predecessors of this host-task node
are assigned to one partition, the host-task is assigned to another partition,
and the successors are assigned to a third partition as shown below:

![Graph partition illustration.](images/SYCL-Graph-partitions.jpg)

Partition numbers are allocated in order. Hence, the runtime must ensure that
Partition `n` complete before starting execution of Partition `n+1`.

Note that partitioning can only happen during the finalization stage due to
potential backward dependencies that could be created using
the `make_edge` function.

### Example
The partitioning process is achieved is two main stages:

1 - Nodes are assigned to a temporary group/partition.

2 - Once all the nodes have been annotated with a group number,
actual partitions are created based on these annotations.

The following diagrams show the annotation process:

![Graph partition illustration step 1.](images/SYCL-Graph-partitions_step1.jpg)
![Graph partition illustration step 2.](images/SYCL-Graph-partitions_step2.jpg)
![Graph partition illustration step 3.](images/SYCL-Graph-partitions_step3.jpg)
![Graph partition illustration step 4.](images/SYCL-Graph-partitions_step4.jpg)
![Graph partition illustration step 5.](images/SYCL-Graph-partitions_step5.jpg)
![Graph partition illustration step 6.](images/SYCL-Graph-partitions_step6.jpg)

Now consider a slightly different graph.
We used the `make_edge` function to create a dependency between Node E and
Node HT1. The first 5 steps are identical.
However, from the step 6 the process changes and a group merge is needed as
illustrated in the following diagrams:

![Graph partition illustration step 6b.](images/SYCL-Graph-partitions_step7.jpg)
![Graph partition illustration step 7b.](images/SYCL-Graph-partitions_step8.jpg)
![Graph partition illustration step 8b.](images/SYCL-Graph-partitions_step9.jpg)
![Graph partition illustration step 9b.](images/SYCL-Graph-partitions_step10.jpg)
![Graph partition illustration step 10b.](images/SYCL-Graph-partitions_step11.jpg)
![Graph partition illustration step 11b.](images/SYCL-Graph-partitions_step12.jpg)

## Memory handling: Buffer and Accessor

There is no extra support for graph-specific USM allocations in the current
Expand Down
Binary file added sycl/doc/design/images/SYCL-Graph-partitions.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
40 changes: 15 additions & 25 deletions sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -1111,10 +1111,21 @@ modifiable graph will perform this action, useful in RAII pattern usage.

:host-task: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#subsec:interfaces.hosttasks

It is not yet supported to have a host task inside a `command_graph`, and an
exception will be thrown if used by application code. Support will be added
subsequently as detailed in the <<future-host-tasks, host tasks>> part from the
<<future-direction, future direction>> section of this specification.
A {host-task}[host task] is a native C++ callable, scheduled according to SYCL
dependency rules. It is valid to record a host task as part of a graph, though it
may lead to sub-optimal graph performance because a host task node may prevent
the SYCL runtime from submitting the entire executable `command_graph` to the
device at once.

[source,c++]
----
auto node = graph.add([&](sycl::handler& cgh){
// Host code here is evaluated during the call to add()
cgh.host_task([=](){
// Code here is evaluated as part of executing the command graph node
});
});
----

=== Queue Behavior In Recording Mode

Expand Down Expand Up @@ -1570,27 +1581,6 @@ if all the commands accessing this buffer use `access_mode::write` or the
Note, however, that these cases require the application to disable copy-back
as described in <<buffer-limitations, Buffer Limitations>>.

==== Host Tasks [[future-host-tasks]]

A {host-task}[host task] is a native C++ callable, scheduled according to SYCL
dependency rules. It is valid to record a host task as part of graph, though it
may lead to sub-optimal graph performance because a host task node may prevent
the SYCL runtime from submitting the entire executable `command_graph` to the
device at once.

Host tasks can be updated as part of <<executable-graph-update, executable graph update>>
by replacing the whole node with the new callable.

[source,c++]
----
auto node = graph.add([&](sycl::handler& cgh){
// Host code here is evaluated during the call to add()
cgh.host_task([=](){
// Code here is evaluated as part of executing the command graph node
});
});
----

==== Executable Graph Update

A graph in the executable state can have each nodes inputs & outputs updated
Expand Down
Loading

0 comments on commit dc5a2b8

Please sign in to comment.