-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Graph] Add External Events APIs #355
Conversation
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
If a graph containing one or more external events is run multiple times, | ||
the external events are automatically reset when the graph is resubmitted to | ||
the queue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reading through this, I'm think about whether we want to overload SYCL events with "external events" at all, as resetting an event isn't a concept in SYCL events and i'm not sure if we'll run into issues in the DPC++ runtime code because it assumes this can't happen. It's also not very intuitive to a user that they now need to maintain the knowledge of if a sycl::event
variable is an external event, a graph-limited event, or a normal event.
An example of an alternative would be to define our own synchronization primitive that can be reset, let's call it
a "semaphore" for the sake of argument. This is what OpenCL have done in cl_khr_semaphore, then if we defined an API for it like:
handler.wait_semaphore()
handler.signal_semaphore()
handler.reset_semaphore() // might not need this, not in cl_khr_semaphore
It could be used in CGF functions of both graph nodes and queue submission and would provide a simpler API for achieving the same goals. Need to check if it does indeed solve the same usecases, be implementable across backends, and it could probably be an extension in itself that we are compatible with (which is more work).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So currently SYCL events don't have a concept of being reset, but I'm not sure this is necessarily a concern as prior to SYCL graphs a DAG could only be traversed once, so it only makes sense for events to be set once. But if a DAG (in the form of a command_graph
) can be traversed multiple times I think it's a reasonable extension of the SYCL execution model that events within that graph are reset each time the graph is executed.
I do agree that it's a bit odd from the perspective of non-graph SYCL code for an event to be reset, though I would be wary of creating a new type as this could create duplicate mechanisms for doing the same thing in non-graph code. Perhaps we need a new concept to represent this to the existing SYCL execution model, such as a multi-shot event which can be complete and reset, and a way to query for this.
I think it could also be useful to have an API which can be used to query whether an event is external or graph-restricted, so that code which is simply passed an event
object can create an "external event" if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's for the comment Gordon, it's good to know there's not an immediate objection to adding reset semantics to an event.
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
The use-case of sharing events between graphs seems pretty easy to deal with and fairly problem free, but I feel like the use-case of waiting on graph events externally from normal user code presents a lot of issues that this PR has to solve. This includes things like:
The use case for this kind of event seems to be signalling to user code that some portion of the graph has finished so that the application can do something with this information. However, I feel like rather than supporting this directly in the API this could simply be accomplished with a // Callback function which could signal some other waiting part of the application
void GraphCallback();
// Add a kernel node
auto KernelNode = Graph.add(...);
// Add a host_task node which depends on that node which signals a callback for example
Graph.add([&](sycl::handler& CGH){
CGH.host_task([](){
// Callback on the host now that KernelNode has finished executing
GraphCallback();
});
}, property::depends_on{KernelNode}); |
Although this solution fulfills the requested feature in a relatively simple way (from our point of view at least), it involves resynchronizing the device flow with the host to execute this callback function which: 1) degrades the performance, 2) increases the workload of the host. The use of events makes it possible to directly synchronize two device flows without host intervention. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like a great feature that will be a really powerful addition to SYCL Graphs and I think the proposed design looks good, though I think we need to think about how best to represent the capability for a "regular" event to be complete and then reset again.
|
||
* `updatable` - Flag indicating that users will update the external event | ||
after the graph finalization. Please, note that setting this flag may degrade | ||
graph execution performance for level-zero backend, as updatable events |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does enabling the updatable
flag only cause a potential performance degradation on the Level Zero backend, or could this also impact other backends as well or can those support external events within their native graph APIs?
I'm assuming if external events are natively supported, these events can be signalled to with little or no interuption to the execution of the graph, but if this would require splitting the graph up into sub-graphs around the external event?
If this is the case, perhaps it would be useful to have an spect which can be queried for whether device supports native external events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
External events are natively supported by CUDA and HIP backend (I don't know for OpenCL @EwanC ?), and can be updated.
As for level-zero, the current API does not allow us to update events once the command-buffer is created. So, in this case, we have to split the graph and handle these updatable external events on the host. (Hopefully, mutable command-list will help to improve that.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying, I think it would be useful to have an aspect for querying whether external events are supported natively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Synchronization outside a command-buffer isn't supported in OpenCL at all, so that backend won't be able to support any of this proposal not just update.
We've got pushback before when trying to add extra aspects as it makes the code less portable, i'd be inclined to leave it out and throw an exception for things that aren't supported natively (documented in spec as a limitation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying, so the OpenCL backend as of now wouldn't be able to support this extension at all, is there interest in adding this capability to the OpenCL command buffer extension as well?
I would probably argue that if we make this an optional feature and add an aspect for it this would improve portability because then users can query for the capability to avoid it failing on devices which don't support it. Saying that I think it would be okay to leave it out for the extension, but if we were to progress this to a KHR I think we would need it then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably argue that if we make this an optional feature and add an aspect for it this would improve portability because then users can query for the capability to avoid it failing on devices which don't support it. Saying that I think it would be okay to leave it out for the extension, but if we were to progress this to a KHR I think we would need it then.
See intel#12486 (comment) (and intel#12486 (comment)) for more general discussion about the role of aspects in SYCL-Graph. It might be worth commenting there if you have strong feelings about aspects for optional features.
Thanks for clarifying, so the OpenCL backend as of now wouldn't be able to support this extension at all, is there interest in adding this capability to the OpenCL command buffer extension as well?
There's definitely interest, it was an early feature request KhronosGroup/OpenCL-Docs#793
The only thing is that the timelines for adding things to the command-buffer spec aren't as quick as a vendor extension we can control, but we could start nudging this in the OpenCL side.
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
If a graph containing one or more external events is run multiple times, | ||
the external events are automatically reset when the graph is resubmitted to | ||
the queue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So currently SYCL events don't have a concept of being reset, but I'm not sure this is necessarily a concern as prior to SYCL graphs a DAG could only be traversed once, so it only makes sense for events to be set once. But if a DAG (in the form of a command_graph
) can be traversed multiple times I think it's a reasonable extension of the SYCL execution model that events within that graph are reset each time the graph is executed.
I do agree that it's a bit odd from the perspective of non-graph SYCL code for an event to be reset, though I would be wary of creating a new type as this could create duplicate mechanisms for doing the same thing in non-graph code. Perhaps we need a new concept to represent this to the existing SYCL execution model, such as a multi-shot event which can be complete and reset, and a way to query for this.
I think it could also be useful to have an API which can be used to query whether an event is external or graph-restricted, so that code which is simply passed an event
object can create an "external event" if necessary.
graphA.end_recording(); | ||
|
||
event externalEventA; | ||
queueA.ext_oneapi_external_event(EventA, externalEventA, flase /* non-updatable */); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine another way to do this would be to pass a property to the submit
call which requires the event to always be external, even when produced by a recording queue. Though I guess the reason is to avoid this kind of design is that it would require modifying existing SYCL code to be used in record/replay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also because some backends don't record events by default (typically CUDA and HIP). So when users request an external event, we have to add a new graph node to record this event.
In the current implementation we have a 1:1 map between events and nodes, so if we only transform the previous graph-limited event into an external event we will lose this.
Moreover, the complexity of implementation would be considerably higher.
e07eb71
to
3d3350e
Compare
I've pushed some changes as a separate commit which can be viewed here: 3d3350e To summarise the changes:
Open questions/discussions:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of minor points from a first read over the new changes. It's probably worth updating the PR description to describe the problem we are trying to solve rather than enumerating the API that's changing frequently.
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
graph to finish. | ||
|
||
External events are considered complete when all their dependent graph nodes | ||
have finished execution, and their execution status is automatically reset when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noting the point we discussed verbally about where the reset should be done by the user rather than automatically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there is some other discussion going on around update which may affect these I'll leave this unresolved for now until we have more clarity on this aspect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also thought this language about being automatically reset was too vague. If you read this literally, then I think your example using the explicit API is not well defined:
// Add an external event node to Graph A
auto ExtEventNode = GraphA.add_external_event();
auto ExternalEventA = ExtEventNode.get_external_event();
// Add a wait external event node in GraphB
auto ExtWaitNode = GraphB.add_wait_external_event({ExternalEventA});
queueA.submit([&](handler& cgh) {
cgh.ext_oneapi_graph(execGraphA);
});
// Is "ExternalEventA" really reset at this point?
// If so, then is the use of "ExternalEventA" in GraphB below ever signalled?
queueB.submit([&](handler& cgh) {
cgh.ext_oneapi_graph(execGraphB);
});
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
event get_external_event(); | ||
---- | ||
|Gets the external event associated with this node. Can only be called for nodes | ||
with type `node_type::external_event`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about other member functions if the node type is node_type::external_event
shall we add something elsewhere for spec completeness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other member functions are valid for node_type::external_event
unless otherwise noted, for example the update_range/nd_range()
member functions do say this under exceptions:
Throws with error code `invalid` if the type of the node is not a kernel execution.
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
|
||
queueB.submit([&](handler& cgh) { | ||
cgh.ext_oneapi_graph(execGraphB); | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what will happen if we first submit execGraphB and then submit execGraphA, endless wait within execGraphB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since they are separate graphs with no implicit dependencies from accessors between them, then no I believe that execGraphA could still execute after execGraphB. It may endlessly wait if there were implicit dependencies though I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not quite understand that 'It may endlessly wait if there were implicit dependencies'.
I agree that execGraphA could run successfully since it does not depend on others.
While for execGraphB, there's a clear explicit dependency that it depends on externalEventA, not "if there were implicit dependencies", so my point is that it always endlessly waits if launch first, not "may endlessly wait".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the external events should be implemented as a device-side dependency, so execGraphB
will not block on the host waiting for the external event because there is no host synchronization there. Since execGraphB
does not block on host that allows the application to proceed and execute execGraphA
, and once it reaches the point where the external event is signaled then execGraphB
(on the device) can also proceed to finish execution. Does that make sense?
My point about implicit dependencies is more of a runtime implementation detail I suppose, but in SYCL using accessors creates implicit dependencies between commands (and we also extend this to graphs), so if you were using accessors which created a dependency this may cause a blocking host wait on attempting to execute execGraphA
which would deadlock in this case.
This could potentially be mitigated by specifying exactly when/how the external event status changes. For example, if we consider the external event to begin in the complete state and be reset upon the owning graph being executed then that would prevent this scenario from being a deadlock, at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the external events should be implemented as a device-side dependency, so
execGraphB
will not block on the host waiting for the external event because there is no host synchronization there. SinceexecGraphB
does not block on host that allows the application to proceed and executeexecGraphA
, and once it reaches the point where the external event is signaled thenexecGraphB
(on the device) can also proceed to finish execution. Does that make sense?
yes, for this case, execGraphB
waits on the device side until there is a host function changes the event status to complete.
My point about implicit dependencies is more of a runtime implementation detail I suppose, but in SYCL using accessors creates implicit dependencies between commands (and we also extend this to graphs), so if you were using accessors which created a dependency this may cause a blocking host wait on attempting to execute
execGraphA
which would deadlock in this case.
just wonder, for most popular cases, the buffer is always accessed by the device, and there is no internal D2H or H2D copy, so, my understanding is that the wait also happens at the device side, why the host side?
This could potentially be mitigated by specifying exactly when/how the external event status changes. For example, if we consider the external event to begin in the complete state and be reset upon the owning graph being executed then that would prevent this scenario from being a deadlock, at least.
yes, good point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just wonder, for most popular cases, the buffer is always accessed by the device, and there is no internal D2H or H2D copy, so, my understanding is that the wait also happens at the device side, why the host side?
You are right actually this should be a device side wait in all cases, I was remembering old workarounds in the backend.
To overcome the limitations of "graph-limited" events, users can add external | ||
event nodes to a graph. These represent a point of execution within the graph | ||
and provide a SYCL event object which can be used to both manage synchronization | ||
between different graphs as well as between the graph and regular operations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regarding "between the graph and regular operations",
For the case that,
a) warmup stage: some regular operations generate an event (name it as eventA).
b) recording stage: it depends on eventA (it is not allowed)
and at the time of warmup stage, the implementation does not know if there will be recording stage.
Any suggestion on how external event can resolve the above issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could allow this behaviour you suggest, and I have gone back and forth on it a few times. Probably allowing normal SYCL events to be passed to command_graph::add_wait_external_event()
and allowing them as dependencies for recorded queue submissions would be fine.
There may be a potential for some undesirable overhead though: the event dependency in the backend could not be removed from subsequent graph executions once it has completed, so there could be a small overhead there particularly if a lot of these types of events were used (assuming there is still some small overhead from checking completed events in the backend/driver).
The user could avoid this by waiting on those one time operations before executing the graph, but this is problematic in the library record case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the restrictions on this in the spec for now, and added some wording explicitly allowing it.
@gmlueck Can't seem to request a review on this one but would appreciate some feedback on this proposal. |
Updates specification with new APIs for handling external events in Record&Replay and Explicit mode. Adds 3 functions to `graph`: - add_barrier() - make_external_event() - update_external_event() Adds 1 function to `queue`: - ext_oneapi_external_event() Adds examples to show how to use these functions. update spec Fix issue in the second code example.
- Remove updating and profiling capabilities - Simplify and align interfaces for getting external events - Restrict external event use within the graph (prevents needing graph-limited event from queue shortcut)
- Restore forbidding outside graph deps that are not external events
- Rename add_barrier to add_wait_external_event - Clarify wording about edges and ext events - Minor formatting and wording changes
- Useful for warmups etc.
6b908b5
to
35788b5
Compare
create an edge, it must be of a specific type (see <<event-class-modifications, | ||
the "Event Class Modifications" section>> for more information). | ||
`handler::depends_on()` can be used to express edges when a user is working with | ||
USM memory rather than SYCL buffers. For a graph recorded with an in-order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
USM memory rather than SYCL buffers. For a graph recorded with an in-order | |
USM memory rather than SYCL buffers. Thirdly, for a graph recorded with an in-order |
since there's "three ways" at line 278.
- Using the event outside of the recording scope will throw synchronously with | ||
error code `invalid`. | ||
|
||
==== External Events |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
==== External Events | |
==== Graph-External Events |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some initial comments after a first read through.
@@ -632,6 +654,10 @@ public: | |||
|
|||
void make_edge(node& src, node& dest); | |||
|
|||
node add_wait_external_event(const std::vector<event> eventList = {}, const property_list& propList = {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
node add_wait_external_event(const std::vector<event> eventList = {}, const property_list& propList = {}); | |
node add_wait_external_event(const std::vector<event>& eventList = {}, const property_list& propList = {}); |
| | ||
[source,c++] | ||
---- | ||
node add_wait_external_event(const std::vector<event> eventList, const property_list& propList = {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
node add_wait_external_event(const std::vector<event> eventList, const property_list& propList = {}); | |
node add_wait_external_event(const std::vector<event>& eventList, const property_list& propList = {}); |
|
||
[source, c++] | ||
---- | ||
namespace sycl::info { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New types should be added to the namespace sycl::ext::oneapi::experimental::info
.
[source, c++] | ||
---- | ||
namespace sycl::info { | ||
enum class ext_oneapi_event_t{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not prefix new types with "ext_oneapi" because they are already in a descriptive namespace. Do not indent declarations within a namespace.
Why not name the enum event_type
? That seems more descriptive and more in line with existing enums (e.g. sycl::info::device_type
).
|Return Type | ||
|Description | ||
|
||
|`info::event::ext_oneapi_event_type` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be named sycl::ext::oneapi::experimental::event::event_type
. This is consistent with the way sycl::info::device::device_type
and sycl::info::device_type
are named.
graph. | ||
`handler::depends_on()` will also create a graph edge when passed certain kinds | ||
of events, for more information see <<event-class-modifications, the "Event | ||
Class Modifications" section>>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This link looks a little awkward in the HTML render. I think it would look better by rewording to:
... see the section <<event-class-modifications, Event Class Modifications>>.
Exceptions: | ||
|
||
* Throws synchronously with error code `invalid` if the queue is not | ||
in recording mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these two member functions are intended to be used when recording a graph. How realistic is it that the caller will know the depEvents
? I thought the usage model is that the caller starts recording a queue, and then calls some non-graph-aware function, which puts commands on the queue. Since the called function is not aware it is being recorded, it can't call this new graph-specific API. However, the depEvents
are probably internal details of the function, so the caller isn't aware of them.
I wonder if it would be better to have a new API that returns an external event which represent a dependency on all the commands that have been recorded to the queue so far. This would allow a caller to record a dependency point inside a recorded graph without knowing the specific depEvents
.
I also think the ext_oneapi_external_event
does not work well when recording an in-order queue. In this case, there typically is not any depEvents
. Will the external event returned by this function have an implicit dependency on all previous commands in this case?
graph to finish. | ||
|
||
External events are considered complete when all their dependent graph nodes | ||
have finished execution, and their execution status is automatically reset when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also thought this language about being automatically reset was too vague. If you read this literally, then I think your example using the explicit API is not well defined:
// Add an external event node to Graph A
auto ExtEventNode = GraphA.add_external_event();
auto ExternalEventA = ExtEventNode.get_external_event();
// Add a wait external event node in GraphB
auto ExtWaitNode = GraphB.add_wait_external_event({ExternalEventA});
queueA.submit([&](handler& cgh) {
cgh.ext_oneapi_graph(execGraphA);
});
// Is "ExternalEventA" really reset at this point?
// If so, then is the use of "ExternalEventA" in GraphB below ever signalled?
queueB.submit([&](handler& cgh) {
cgh.ext_oneapi_graph(execGraphB);
});
Closing this in favour of #372 which presents a simpler API for accomplishing the same general goals as this. |
Updates specification with new APIs for handling external events in Record&Replay and Explicit mode.
Motivations:
Major changes:
queue::ext_oneapi_external_event()
used to obtain an external event in the Record & Replay API.