-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Graph] Enable L0 optimizations (no profiling mode) #358
Conversation
Analyze the graph and apply enable the use of in-order command-list for linear graph. Add a property to finalize function to disable this optimization which is not compatible with profiling. Update the specification.
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
Co-authored-by: Pablo Reble <[email protected]>
…is not passed to finalize(). + Update spec.
|
||
/// @return True if the partition contains a host task | ||
bool isHostTask() const { | ||
return (MRoots.size() && ((*MRoots.begin()).lock()->MCGType == | ||
sycl::detail::CG::CGTYPE::CodeplayHostTask)); | ||
} | ||
|
||
/// Checks if the graph is single path, i.e. each node has a single successor. | ||
/// If so, the MIsInOrderGraph flag is set. | ||
void checkIfGraphIsSinglePath() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the overhead of adding this routine in comparison to the potential in-order optimization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its difficult to give an approximation of the overhead for this routine since it depends of the graph typology.
That said, if in-order graph is found we win on both sides: finalization delay (we do not need to create events) and execution time (we do not have to execute events nor synchronization on them).
On my setup (12th Gen Intel(R) Core(TM) i9-12900K, Intel(R) Level-Zero, Intel(R) UHD Graphics 770 1.3 [1.3.28454]), the finalization delay for an 2000 nodes in-order graph is reduced by ~40%. The execution time is reduced by ~15% and the second execution by ~20% compared to execution with event profiling capability disabled (and respectively 20% and 30% with the current implementation (i.e. event profiling enabled)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this sounds promising. I think we should run some microbenchmarks with and without these changes to better understand the overhead for nonlinear graphs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my setup, the checkIfGraphIsSinglePath
function takes less than 0.01% of the total runtime of finalize
for checking 2000 nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im not sure if just looking at the single function call is a fair metric, because the Schedule
of the Graph is already available. That obviously is an implementation detail and won't necessarily take away the concerns about the complexity of this check. Another aspect that might be relevant: What if the check fails, it might be still beneficial to execute Schedule
as-is on an in-order CommandList, or interleaving on multiple in-order CommandLists. That brings up the question if a scheduling hint that we'd pass as a property on graph finalization is a better option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extending running things in-order to be user controlled with a property or something definitely seems like it could be useful.
But as to the other point of discussion here, do we really need to care that much about small optimizations of finalize()
? It is expensive by design and barring any outlandishly slow performance it seems to me it doesn't really matter much at all how it performs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the idea of adding a hint/property to enable in-order command lists in more situation, it seems that probably requires some more in-depth discussion and is probably better done as a separate PR to avoid delaying this one too much.
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
Co-authored-by: Ewan Crawford <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🙂
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
Upstream PR here (draft until UR changes merge): intel#13088 |
Analyze the graph and apply enable the use of in-order command-list for linear graph.