@@ -224,59 +224,107 @@ there are no parameters to take a wait-list, and the only sync primitive
224
224
returned is blocking on host.
225
225
226
226
In order to achieve the expected UR command-buffer enqueue semantics with Level
227
- Zero, the adapter implementation adds extra commands to the Level Zero
228
- command-list representing a UR command-buffer.
229
-
230
- * Prefix - Commands added to the start of the L0 command-list by L0 adapter.
231
- * Suffix - Commands added to the end of the L0 command-list by L0 adapter.
232
-
233
- These extra commands operate on L0 event synchronisation primitives, used by the
234
- command-list to interact with the external UR wait-list and UR return event
235
- required for the enqueue interface.
236
-
237
- The ` ur_exp_command_buffer_handle_t ` class for this adapter contains a
238
- * SignalEvent* which signals the completion of the command-list in the suffix,
239
- and is reset in the prefix. This signal is detected by a new UR return event
240
- created on UR command-buffer enqueue.
241
-
242
- There is also a * WaitEvent* used by the ` ur_exp_command_buffer_handle_t ` class
243
- in the prefix to wait on any dependencies passed in the enqueue wait-list.
244
- This WaitEvent is reset in the suffix.
245
-
246
- A command-buffer is expected to be submitted multiple times. Consequently,
227
+ Zero, the adapter implementation needs extra commands.
228
+
229
+ * Prefix - Commands added ** before** the graph workload.
230
+ * Suffix - Commands added ** after** the graph workload.
231
+
232
+ These extra commands operate on L0 event synchronisation primitives,
233
+ used by the command-list to interact with the external UR wait-list
234
+ and UR return event required for the enqueue interface.
235
+ Unlike the graph workload (i.e. commands needed to perform the graph workload)
236
+ the external UR wait-list and UR return event are submission dependent,
237
+ which mean they can change from one submission to the next.
238
+
239
+ For performance concerns, the command-list that will execute the graph
240
+ workload is made only once (during the command-buffer finalization stage).
241
+ This allows the adapter to save time when submitting the command-buffer,
242
+ by executing only this command-list (i.e. without enqueuing any commands
243
+ of the graph workload).
244
+
245
+ #### Prefix
246
+
247
+ The prefix's commands aim to:
248
+ 1 . Handle the the list on events to wait on, which is passed by the runtime
249
+ when the UR command-buffer enqueue function is called.
250
+ As mentioned above, this list of events changes from one submission
251
+ to the next.
252
+ Consequently, managing this mutable dependency in the graph-workload
253
+ command-list implies rebuilding the command-list for each submission
254
+ (note that this can change with mutable command-list).
255
+ To avoid the signifiant time penalty of rebuilding this potentially large
256
+ command-list each time, we prefer to add an extra command handling the
257
+ wait list into another command-list (* wait command-list* ).
258
+ This command-list consists of a single L0 command: a barrier that waits for
259
+ dependencies passed by the wait-list and signals a signal
260
+ called * WaitEvent* when the barrier is complete.
261
+ This * WaitEvent* is defined in the ` ur_exp_command_buffer_handle_t ` class.
262
+ In the front of the graph workload command list, an extra barrier command
263
+ waiting for this event is added (when the command-buffer is created).
264
+ This ensures that the graph workload does not start running before
265
+ the dependencies to be completed.
266
+ The * WaitEvent* event is reset in the suffix.
267
+
268
+
269
+ 2 . Reset events associated with the command-buffer except the
270
+ * WaitEvent* event.
271
+ Indeed, L0 events needs to be explicitly reset by an API call
272
+ (L0 command in our case).
273
+ Since a command-buffer is expected to be submitted multiple times,
247
274
we need to ensure that L0 events associated with graph commands have not
248
275
been signaled by a previous execution. These events are therefore reset to the
249
- non-signaled state before running the actual graph associated commands . Note
276
+ non-signaled state before running the graph-workload command-list . Note
250
277
that this reset is performed in the prefix and not in the suffix to avoid
251
278
additional synchronization w.r.t profiling data extraction.
252
-
253
- If a command-buffer is about to be submitted to a queue with the profiling
254
- property enabled, an extra command that copies timestamps of L0 events
255
- associated with graph commands into a dedicated memory which is attached to the
256
- returned UR event. This memory stores the profiling information that
257
- corresponds to the current submission of the command-buffer.
258
-
259
- ![ L0 command-buffer diagram] ( images/L0_UR_command-buffer-v3.jpg )
279
+ We use a new command list (* reset command-list* ) for performance concerns.
280
+ Indeed:
281
+ * This allows the * WaitEvent* to be signaled directly on the host if
282
+ the waiting list is empty, thus avoiding the need to submit a command list.
283
+ * Enqueuing a reset L0 command for all events in the command-buffer is time
284
+ consumming, especially for large graphs.
285
+ However, this task is not needed for every submission, but only once, when the
286
+ command-buffer is fixed, i.e. when the command-buffer is finalized. The
287
+ decorellation between the reset command-list and the wait command-list allow us to
288
+ create and enqueue the reset commands when finalizing the command-buffer,
289
+ and only create the wait command-list at submission.
290
+
291
+ This command list is consist of a reset command for each of the graph commands
292
+ and another reset command for resetting the signal we use to signal the completion
293
+ of the graph workload. This signal is called * SignalEvent* and is defined in
294
+ in the ` ur_exp_command_buffer_handle_t ` class.
295
+
296
+ #### Suffix
297
+
298
+ The suffix's commands aim to:
299
+ 1 ) Handle the completion of the graph workload and signal
300
+ an UR return event.
301
+ Thus, at the end of the graph workload command-list a command, which
302
+ signals the * SignalEvent* , is added (when the command-buffer is finalized).
303
+ In an additional command-list (* signal command-list* ), a barrier waiting for
304
+ this event is also added.
305
+ This barrier signals, in turn, the UR return event that has be defined by
306
+ the runtime layer when calling the ` urCommandBufferEnqueueExp ` function.
307
+
308
+ 2 ) Manage the profiling. If a command-buffer is about to be submitted to
309
+ a queue with the profiling property enabled, an extra command that copies
310
+ timestamps of L0 events associated with graph commands into a dedicated
311
+ memory which is attached to the returned UR event.
312
+ This memory stores the profiling information that corresponds to
313
+ the current submission of the command-buffer.
314
+
315
+ ![ L0 command-buffer diagram] ( images/L0_UR_command-buffer-v5.jpg )
260
316
261
317
For a call to ` urCommandBufferEnqueueExp ` with an ` event_list ` * EL* ,
262
- command-buffer * CB* , and return event * RE* our implementation has to submit two
263
- new command-lists for the above approach to work. One before
318
+ command-buffer * CB* , and return event * RE* our implementation has to submit
319
+ three new command-lists for the above approach to work. Two before
264
320
the command-list with extra commands associated with * CB* , and the other
265
- after * CB* . These two new command-lists are retrieved from the UR queue, which
321
+ after * CB* . These new command-lists are retrieved from the UR queue, which
266
322
will likely reuse existing command-lists and only create a new one in the worst
267
323
case.
268
324
269
- The L0 command-list created on ` urCommandBufferEnqueueExp ` to execute ** before**
270
- * CB* contains a single command. This command is a barrier on * EL* that signals
271
- * CB* 's * WaitEvent* when completed.
272
-
273
- The L0 command-list created on ` urCommandBufferEnqueueExp ` to execute ** after**
274
- * CB* also contains a single command. This command is a barrier on * CB* 's
275
- * SignalEvent* that signals * RE* when completed.
276
-
277
325
#### Drawbacks
278
326
279
- There are two drawbacks of this approach to implementing UR command-buffers for
327
+ There are three drawbacks of this approach to implementing UR command-buffers for
280
328
Level Zero:
281
329
282
330
1 . 3x the command-list resources are used, if there are many UR command-buffers in
0 commit comments