Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Graph] Enable L0 optimizations (no profiling mode) #358

Closed
wants to merge 12 commits into from

Conversation

mfrancepillois
Copy link
Collaborator

@mfrancepillois mfrancepillois commented Feb 23, 2024

  • Enable in-order cmd-list
    Analyze the graph and apply enable the use of in-order command-list for linear graph.
  • Add a property to finalize function to enable graph profiling.
  • Update the specification.

Analyze the graph and apply enable the use of in-order command-list for linear graph.
Add a property to finalize function to disable this optimization which is not compatible with profiling.
Update the specification.
@mfrancepillois mfrancepillois added the Graph Implementation Related to DPC++ implementation and testing label Feb 23, 2024
sycl/source/detail/graph_impl.cpp Outdated Show resolved Hide resolved
sycl/source/detail/graph_impl.cpp Outdated Show resolved Hide resolved
sycl/source/detail/graph_impl.hpp Outdated Show resolved Hide resolved
sycl/source/detail/graph_impl.hpp Outdated Show resolved Hide resolved
@mfrancepillois mfrancepillois changed the title [SYCL][Graph] Enable in-order cmd-list [SYCL][Graph] Enable L0 optimizations (no profiling mode) Feb 27, 2024
sycl/include/sycl/ext/oneapi/experimental/graph.hpp Outdated Show resolved Hide resolved
sycl/source/detail/graph_impl.hpp Outdated Show resolved Hide resolved
sycl/test-e2e/Graph/event_profiling_info.cpp Outdated Show resolved Hide resolved
sycl/source/detail/graph_impl.hpp Outdated Show resolved Hide resolved
sycl/source/detail/graph_impl.hpp Outdated Show resolved Hide resolved
sycl/source/detail/graph_impl.hpp Outdated Show resolved Hide resolved
sycl/source/detail/graph_impl.hpp Outdated Show resolved Hide resolved

/// @return True if the partition contains a host task
bool isHostTask() const {
return (MRoots.size() && ((*MRoots.begin()).lock()->MCGType ==
sycl::detail::CG::CGTYPE::CodeplayHostTask));
}

/// Checks if the graph is single path, i.e. each node has a single successor.
/// If so, the MIsInOrderGraph flag is set.
void checkIfGraphIsSinglePath() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the overhead of adding this routine in comparison to the potential in-order optimization?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its difficult to give an approximation of the overhead for this routine since it depends of the graph typology.
That said, if in-order graph is found we win on both sides: finalization delay (we do not need to create events) and execution time (we do not have to execute events nor synchronization on them).
On my setup (12th Gen Intel(R) Core(TM) i9-12900K, Intel(R) Level-Zero, Intel(R) UHD Graphics 770 1.3 [1.3.28454]), the finalization delay for an 2000 nodes in-order graph is reduced by ~40%. The execution time is reduced by ~15% and the second execution by ~20% compared to execution with event profiling capability disabled (and respectively 20% and 30% with the current implementation (i.e. event profiling enabled)).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this sounds promising. I think we should run some microbenchmarks with and without these changes to better understand the overhead for nonlinear graphs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my setup, the checkIfGraphIsSinglePath function takes less than 0.01% of the total runtime of finalize for checking 2000 nodes.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im not sure if just looking at the single function call is a fair metric, because the Schedule of the Graph is already available. That obviously is an implementation detail and won't necessarily take away the concerns about the complexity of this check. Another aspect that might be relevant: What if the check fails, it might be still beneficial to execute Schedule as-is on an in-order CommandList, or interleaving on multiple in-order CommandLists. That brings up the question if a scheduling hint that we'd pass as a property on graph finalization is a better option?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extending running things in-order to be user controlled with a property or something definitely seems like it could be useful.

But as to the other point of discussion here, do we really need to care that much about small optimizations of finalize()? It is expensive by design and barring any outlandishly slow performance it seems to me it doesn't really matter much at all how it performs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the idea of adding a hint/property to enable in-order command lists in more situation, it seems that probably requires some more in-depth discussion and is probably better done as a separate PR to avoid delaying this one too much.

sycl/test-e2e/Graph/event_profiling_info.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@EwanC EwanC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🙂

sycl/source/detail/event_impl.hpp Show resolved Hide resolved
sycl/include/sycl/ext/oneapi/experimental/graph.hpp Outdated Show resolved Hide resolved
sycl/source/detail/graph_impl.cpp Outdated Show resolved Hide resolved
sycl/source/detail/graph_impl.hpp Outdated Show resolved Hide resolved
sycl/test-e2e/Graph/event_profiling_info.cpp Outdated Show resolved Hide resolved
@mfrancepillois mfrancepillois requested a review from EwanC March 8, 2024 11:07
@mfrancepillois mfrancepillois requested a review from Bensuo March 8, 2024 11:07
@Bensuo
Copy link
Collaborator

Bensuo commented Mar 21, 2024

Upstream PR here (draft until UR changes merge): intel#13088

@Bensuo Bensuo closed this Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Graph Implementation Related to DPC++ implementation and testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants