Skip to content

Commit

Permalink
Set enable-profling on nodes rather than graph_exec (finalize)
Browse files Browse the repository at this point in the history
  • Loading branch information
mfrancepillois committed Mar 8, 2024
1 parent 0318696 commit 2127b90
Show file tree
Hide file tree
Showing 7 changed files with 105 additions and 69 deletions.
66 changes: 43 additions & 23 deletions sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -342,10 +342,6 @@ class assume_buffer_outlives_graph {
assume_buffer_outlives_graph() = default;
};
class enable_profiling {
public:
enable_profiling() = default;
};
} // namespace graph
namespace node {
Expand All @@ -361,6 +357,11 @@ class depends_on_all_leaves {
depends_on_all_leaves() = default;
};
class enable_profiling {
public:
enable_profiling() = default;
};
} // namespace node
} // namespace property
Expand Down Expand Up @@ -585,6 +586,31 @@ class depends_on_all_leaves {
}
----

===== Enable-Profiling Property [[enable-profiling]]

The `property::graph::enable_profiling` property can be passed to a
`command_graph::add()` function and enables profiling support
for the node in the `command_graph<graph_state::executable>`.
Passing this property implies disabling certain optimizations.
This is why profiling is by default disabled on graphs, unless users
explicitly require it using either the `property::graph::enable_profiling`
property in building mode or the `property::queue::enable_profiling` on
the recorded queue (Record&Replay API).
As a result, the execution time of a graph finalized with profiling enabled
is longer than that of a graph without profiling capability.
An error will be thrown when attempting to profile an event
from a graph submission that was created without this property.

[source,c++]
----
namespace sycl::ext::oneapi::experimental::property::node {
class enable_profiling {
public:
enable_profiling() = default;
};
}
----

=== Graph

This extension adds a new `command_graph` object which follows the
Expand Down Expand Up @@ -651,18 +677,6 @@ which is used in a graph will be kept alive on the host for the lifetime of the
graph. Destroying that buffer during the lifetime of a `command_graph`
constructed with this property results in undefined behavior.

===== Enable-Profiling Property [[enable-profiling]]

The `property::graph::enable_profiling` property can be passed to the
`command_graph::finalize()` function and enables profiling support
for the returned `command_graph<graph_state::executable>`.
Passing this property to the finalize function implies disabling
certain optimizations to enable graph profiling.
As a result, the execution time of a graph finalized with profiling enabled
is longer than that of a graph without profiling capability.
An error will be thrown when attempting to profile an event
from a graph submission that was created without this property.

==== Graph Member Functions

Table {counter: tableNumber}. Constructor of the `command_graph` class.
Expand Down Expand Up @@ -761,6 +775,8 @@ Parameters:
* `propList` - Zero or more properties can be provided to the constructed node
via an instance of `property_list`. The `property::node::depends_on` property
can be passed here with a list of nodes to create dependency edges on.
The `enable_profiling` property enables the profiling of this node.
See <<enable-profiling, Enable-Profiling>> for more details.


Returns: The empty node which has been added to the graph.
Expand Down Expand Up @@ -798,6 +814,9 @@ Parameters:
* `propList` - Zero or more properties can be provided to the constructed node
via an instance of `property_list`. The `property::node::depends_on` property
can be passed here with a list of nodes to create dependency edges on.
The `enable_profiling` property enables the profiling of this node.
See <<enable-profiling, Enable-Profiling>> for more details.


Returns: The command-group function object node which has been added to the graph.

Expand Down Expand Up @@ -866,9 +885,8 @@ Preconditions:

Parameters:

* `propList` - Optional parameter for passing properties. The only defined
property is `enable_profiling`. See <<enable-profiling, Enable-Profiling>>
for more details.
* `propList` - Optional parameter for passing properties. No finalization
properties are defined by this extension.

Returns: A new executable graph object which can be submitted to a queue.

Expand Down Expand Up @@ -1091,10 +1109,12 @@ ways:
an implicit dependency before and after the graph execution, as if the graph
execution is one command-group submitted to the in-order queue.

2. `property::queue::enable_profiling` - This property has no effect on graph
recording. When set on the queue a graph is submitted to however, it allows
profiling information to be obtained from the event returned by a graph
submission. As it is not defined how a submitted graph will be split up for
2. `property::queue::enable_profiling` - This property must be set on the queue
in recording mode if users want to profile the commands recorded to
the graph.This property must also be set on the queue the queue a graph is
submitted to. It allows profiling information to be obtained from the event
returned by a graph submission.
As it is not defined how a submitted graph will be split up for
scheduling at runtime, the `uint64_t` timestamp reported from a profiling
query on a graph execution event has the following semantics, which may be
pessimistic about execution time on device.
Expand Down
41 changes: 23 additions & 18 deletions sycl/include/sycl/ext/oneapi/experimental/graph.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -146,15 +146,6 @@ class assume_buffer_outlives_graph
public:
assume_buffer_outlives_graph() = default;
};

/// Property used to enable graph profiling.
/// Passing this property to the `command_graph::finalize()` function
/// ensures that profiling can be used on the generated graph.
class enable_profiling : public ::sycl::detail::DataLessProperty<
::sycl::detail::GraphEnableProfiling> {
public:
enable_profiling() = default;
};
} // namespace graph

namespace node {
Expand Down Expand Up @@ -183,6 +174,15 @@ class depends_on_all_leaves : public ::sycl::detail::DataLessProperty<
depends_on_all_leaves() = default;
};

/// Property used to enable node profiling.
/// Passing this property to the `command_graph::add()` function
/// ensures that profiling can be queried on this node.
class enable_profiling : public ::sycl::detail::DataLessProperty<
::sycl::detail::GraphEnableProfiling> {
public:
enable_profiling() = default;
};

} // namespace node
} // namespace property

Expand All @@ -209,15 +209,17 @@ class __SYCL_EXPORT modifiable_command_graph {
/// @param PropList Property list used to pass [0..n] predecessor nodes.
/// @return Constructed empty node which has been added to the graph.
node add(const property_list &PropList = {}) {
bool EnableProfiling =
PropList.has_property<property::node::enable_profiling>();
if (PropList.has_property<property::node::depends_on>()) {
auto Deps = PropList.get_property<property::node::depends_on>();
node Node = addImpl(Deps.get_dependencies());
node Node = addImpl(Deps.get_dependencies(), EnableProfiling);
if (PropList.has_property<property::node::depends_on_all_leaves>()) {
addGraphLeafDependencies(Node);
}
return Node;
}
node Node = addImpl({});
node Node = addImpl({}, EnableProfiling);
if (PropList.has_property<property::node::depends_on_all_leaves>()) {
addGraphLeafDependencies(Node);
}
Expand All @@ -229,15 +231,17 @@ class __SYCL_EXPORT modifiable_command_graph {
/// @param PropList Property list used to pass [0..n] predecessor nodes.
/// @return Constructed node which has been added to the graph.
template <typename T> node add(T CGF, const property_list &PropList = {}) {
bool EnableProfiling =
PropList.has_property<property::node::enable_profiling>();
if (PropList.has_property<property::node::depends_on>()) {
auto Deps = PropList.get_property<property::node::depends_on>();
node Node = addImpl(CGF, Deps.get_dependencies());
node Node = addImpl(CGF, Deps.get_dependencies(), EnableProfiling);
if (PropList.has_property<property::node::depends_on_all_leaves>()) {
addGraphLeafDependencies(Node);
}
return Node;
}
node Node = addImpl(CGF, {});
node Node = addImpl(CGF, {}, EnableProfiling);
if (PropList.has_property<property::node::depends_on_all_leaves>()) {
addGraphLeafDependencies(Node);
}
Expand Down Expand Up @@ -312,14 +316,16 @@ class __SYCL_EXPORT modifiable_command_graph {
/// Template-less implementation of add() for CGF nodes.
/// @param CGF Command-group function to add.
/// @param Dep List of predecessor nodes.
/// @param EnableProfiling Enable node profiling.
/// @return Node added to the graph.
node addImpl(std::function<void(handler &)> CGF,
const std::vector<node> &Dep);
node addImpl(std::function<void(handler &)> CGF, const std::vector<node> &Dep,
const bool EnableProfiling);

/// Template-less implementation of add() for empty nodes.
/// @param Dep List of predecessor nodes.
/// @param EnableProfiling Enable node profiling.
/// @return Node added to the graph.
node addImpl(const std::vector<node> &Dep);
node addImpl(const std::vector<node> &Dep, const bool EnableProfiling);

/// Adds all graph leaves as dependencies
/// @param Node Destination node to which the leaves of the graph will be
Expand Down Expand Up @@ -352,8 +358,7 @@ class __SYCL_EXPORT executable_command_graph {
/// @param PropList Optional list of properties to pass.
executable_command_graph(const std::shared_ptr<detail::graph_impl> &Graph,
const sycl::context &Ctx,
const property_list &PropList = {});

const property_list &PropList = {});

template <class Obj>
friend decltype(Obj::impl)
Expand Down
19 changes: 14 additions & 5 deletions sycl/source/detail/graph_impl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,13 @@ void exec_graph_impl::makePartitions() {
if (Node->MCGType == sycl::detail::CG::CodeplayHostTask) {
HostTaskList.push_back(Node);
}
// Next line is supposed to be temporary.
// Nodes are not profiled individually, but the profiling of the whole graph
// is enabled if at least one node has profiling enabled. This should be
// changed once the PR https://github.com/intel/llvm/pull/12592 on node
// profiling is merged. This also will involve updating all the UR enqueue
// cmd functions to add a new parameter containing the profiling status.
MEnableProfiling |= Node->MProfilingEnabled;
}

// Annotate nodes
Expand Down Expand Up @@ -946,9 +953,7 @@ exec_graph_impl::enqueue(const std::shared_ptr<sycl::detail::queue_impl> &Queue,
NewEvent->attachEventToComplete(Elem.second);
}
}
if (!MEnableProfiling) {
NewEvent->setProfilingEnabled(false);
}
NewEvent->setProfilingEnabled(MEnableProfiling);
sycl::event QueueEvent =
sycl::detail::createSyclObjFromImpl<sycl::event>(NewEvent);
return QueueEvent;
Expand Down Expand Up @@ -1109,7 +1114,8 @@ modifiable_command_graph::modifiable_command_graph(
: impl(std::make_shared<detail::graph_impl>(
SyclQueue.get_context(), SyclQueue.get_device(), PropList)) {}

node modifiable_command_graph::addImpl(const std::vector<node> &Deps) {
node modifiable_command_graph::addImpl(const std::vector<node> &Deps,
const bool EnableProfiling) {
impl->throwIfGraphRecordingQueue("Explicit API \"Add()\" function");
std::vector<std::shared_ptr<detail::node_impl>> DepImpls;
for (auto &D : Deps) {
Expand All @@ -1118,11 +1124,13 @@ node modifiable_command_graph::addImpl(const std::vector<node> &Deps) {

graph_impl::WriteLock Lock(impl->MMutex);
std::shared_ptr<detail::node_impl> NodeImpl = impl->add(impl, DepImpls);
NodeImpl->MProfilingEnabled = EnableProfiling;
return sycl::detail::createSyclObjFromImpl<node>(NodeImpl);
}

node modifiable_command_graph::addImpl(std::function<void(handler &)> CGF,
const std::vector<node> &Deps) {
const std::vector<node> &Deps,
const bool EnableProfiling) {
impl->throwIfGraphRecordingQueue("Explicit API \"Add()\" function");
std::vector<std::shared_ptr<detail::node_impl>> DepImpls;
for (auto &D : Deps) {
Expand All @@ -1132,6 +1140,7 @@ node modifiable_command_graph::addImpl(std::function<void(handler &)> CGF,
graph_impl::WriteLock Lock(impl->MMutex);
std::shared_ptr<detail::node_impl> NodeImpl =
impl->add(impl, CGF, {}, DepImpls);
NodeImpl->MProfilingEnabled = EnableProfiling;
return sycl::detail::createSyclObjFromImpl<node>(NodeImpl);
}

Expand Down
13 changes: 9 additions & 4 deletions sycl/source/detail/graph_impl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,9 @@ class node_impl {
/// Used for tracking visited status during cycle checks.
bool MVisited = false;

/// If true, the graph profiling is enabled for this node.
bool MProfilingEnabled = false;

/// Partition number needed to assign a Node to a a partition.
/// Note : This number is only used during the partitionning process and
/// cannot be used to find out the partion of a node outside of this process.
Expand Down Expand Up @@ -152,7 +155,8 @@ class node_impl {
node_impl(node_impl &Other)
: MSuccessors(Other.MSuccessors), MPredecessors(Other.MPredecessors),
MCGType(Other.MCGType), MNodeType(Other.MNodeType),
MCommandGroup(Other.getCGCopy()), MSubGraphImpl(Other.MSubGraphImpl) {}
MCommandGroup(Other.getCGCopy()), MSubGraphImpl(Other.MSubGraphImpl),
MProfilingEnabled(Other.MProfilingEnabled) {}

/// Checks if this node has a given requirement.
/// @param Requirement Requirement to lookup.
Expand Down Expand Up @@ -541,6 +545,9 @@ class node_impl {
Stream << "Other \\n";
break;
}
if (MProfilingEnabled) {
Stream << "Profiling Enabled \\n";
}
Stream << "\"];" << std::endl;
}

Expand Down Expand Up @@ -1066,9 +1073,7 @@ class exec_graph_impl {
const std::shared_ptr<graph_impl> &GraphImpl,
const property_list &PropList)
: MSchedule(), MGraphImpl(GraphImpl), MPiSyncPoints(), MContext(Context),
MRequirements(), MExecutionEvents(),
MEnableProfiling(
PropList.has_property<property::graph::enable_profiling>()) {
MRequirements(), MExecutionEvents() {
// Copy nodes from GraphImpl and merge any subgraph nodes into this graph.
duplicateNodes();
}
Expand Down
2 changes: 1 addition & 1 deletion sycl/source/handler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -543,7 +543,7 @@ event handler::finalize() {
} else {
NodeImpl = GraphImpl->add(NodeType, std::move(CommandGroup));
}

NodeImpl->MProfilingEnabled = MQueue->MIsProfilingEnabled;
// Associate an event with this new node and return the event.
GraphImpl->addEventForNode(GraphImpl, EventImpl, NodeImpl);

Expand Down
29 changes: 13 additions & 16 deletions sycl/test-e2e/Graph/event_profiling_info.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,8 @@ bool compareProfiling(event Event1, event Event2) {
// event to complete execution.
int main() {
device Dev;
// The queue on which the graph is recorded must have the `enable_profiling`
// set to enable graph profiling.
queue Queue{Dev, {sycl::property::queue::enable_profiling()}};

const size_t Size = 100000;
Expand All @@ -107,17 +109,16 @@ int main() {
Queue.get_context(),
Queue.get_device(),
{exp_ext::property::graph::assume_buffer_outlives_graph{}}};
CopyGraph.begin_recording(Queue);

Queue.submit([&](sycl::handler &Cgh) {
accessor<int, 1, access::mode::read, access::target::device> AccessorFrom(
BufferFrom, Cgh, range<1>(Size));
accessor<int, 1, access::mode::write, access::target::device> AccessorTo(
BufferTo, Cgh, range<1>(Size));
Cgh.copy(AccessorFrom, AccessorTo);
});

CopyGraph.end_recording(Queue);
CopyGraph.add(
([&](sycl::handler &Cgh) {
accessor<int, 1, access::mode::read, access::target::device>
AccessorFrom(BufferFrom, Cgh, range<1>(Size));
accessor<int, 1, access::mode::write, access::target::device>
AccessorTo(BufferTo, Cgh, range<1>(Size));
Cgh.copy(AccessorFrom, AccessorTo);
}),
{exp_ext::property::node::enable_profiling{}});

// kernel launch
exp_ext::command_graph KernelGraph{
Expand All @@ -130,12 +131,8 @@ int main() {

KernelGraph.end_recording(Queue);

// The `enable_profiling` property must be passed to the finalize function
// in order to query profiling information.
auto CopyGraphExec =
CopyGraph.finalize(exp_ext::property::graph::enable_profiling{});
auto KernelGraphExec =
KernelGraph.finalize(exp_ext::property::graph::enable_profiling{});
auto CopyGraphExec = CopyGraph.finalize();
auto KernelGraphExec = KernelGraph.finalize();

event CopyEvent, KernelEvent1, KernelEvent2;
// Run graphs
Expand Down
4 changes: 2 additions & 2 deletions sycl/unittests/Extensions/CommandGraph.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2271,8 +2271,8 @@ TEST_F(CommandGraphTest, ProfilingExceptionProperty) {
[&](sycl::handler &cgh) { cgh.single_task<TestKernel<>>([]() {}); });
Graph.end_recording(Queue);

// Checks exception thrown if profiling is requested while the
// enable_profiling property has not been passed to `finalize()`.
// Checks exception thrown if profiling is requested while profiling has
// not be enabled during the graph building.
auto GraphExecInOrder = Graph.finalize();
queue QueueProfile{Dev, {sycl::property::queue::enable_profiling()}};
auto EventInOrder = QueueProfile.submit(
Expand Down

0 comments on commit 2127b90

Please sign in to comment.