Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Graph] Enable L0 optimizations (no profiling mode) #358
[SYCL][Graph] Enable L0 optimizations (no profiling mode) #358
Changes from 2 commits
8c5dea5
f671539
15f02c0
0527464
b55a301
d885aa0
b922a77
c084e19
49630d8
e4ee57e
0318696
2127b90
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the overhead of adding this routine in comparison to the potential in-order optimization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its difficult to give an approximation of the overhead for this routine since it depends of the graph typology.
That said, if in-order graph is found we win on both sides: finalization delay (we do not need to create events) and execution time (we do not have to execute events nor synchronization on them).
On my setup (12th Gen Intel(R) Core(TM) i9-12900K, Intel(R) Level-Zero, Intel(R) UHD Graphics 770 1.3 [1.3.28454]), the finalization delay for an 2000 nodes in-order graph is reduced by ~40%. The execution time is reduced by ~15% and the second execution by ~20% compared to execution with event profiling capability disabled (and respectively 20% and 30% with the current implementation (i.e. event profiling enabled)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this sounds promising. I think we should run some microbenchmarks with and without these changes to better understand the overhead for nonlinear graphs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my setup, the
checkIfGraphIsSinglePath
function takes less than 0.01% of the total runtime offinalize
for checking 2000 nodes.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im not sure if just looking at the single function call is a fair metric, because the
Schedule
of the Graph is already available. That obviously is an implementation detail and won't necessarily take away the concerns about the complexity of this check. Another aspect that might be relevant: What if the check fails, it might be still beneficial to executeSchedule
as-is on an in-order CommandList, or interleaving on multiple in-order CommandLists. That brings up the question if a scheduling hint that we'd pass as a property on graph finalization is a better option?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extending running things in-order to be user controlled with a property or something definitely seems like it could be useful.
But as to the other point of discussion here, do we really need to care that much about small optimizations of
finalize()
? It is expensive by design and barring any outlandishly slow performance it seems to me it doesn't really matter much at all how it performs.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the idea of adding a hint/property to enable in-order command lists in more situation, it seems that probably requires some more in-depth discussion and is probably better done as a separate PR to avoid delaying this one too much.