This exercise demonstrates an asynchronous data transfer and computation. Three different asynchronous cases are created, and their timings are printed out. The timings are recorded with hipEvent calls.
In the exercise, the following HIP functions are needed:
hipStreamCreate()
hipMemcpyAsync()
hipEventRecord()
hipEventSynchronize()
hipEventElapsedTime()
hipStreamDestroy()
- Create and destroy
n_stream
streams in the main function in the locations marked by#error
- The function
case_0()
is already complete and can be used as a reference
- In the
case_1()
function, create a loop overn_stream
and split the work done by the kernel call of Case 0 into multiple kernels calls (one kernel call per stream with an even workload per stream) - Record events using
start_event
andstop_event
arrays for each stream before and after the kernel call
- Create a loop into the function
case_2()
- In the loop: Split the data copy from host to device into
n_stream
asynchronous memcopies. one for each stream (make sure the memcopies are split evenly for each stream) - In the loop: Launch the kernel for each stream similarly to Case 1
- In the loop: Split the data copy from device to host into
n_stream
asynchronous memcopies. one for each stream (make sure the memcopies are split asynchronously
- In the loop: Split the data copy from host to device into
- Record total timing of the loop, use
start_event[n_stream]
andstop_event[n_stream]
array positions - Additionally, record events for each stream using
start_event
andstop_event
arrays before H-to-D memcopy and after D-to-H memcopy, respectively - Synchronize host with each
stop_event[i]
- Get timings between each corresponding
start_event[i]
andstop_event[i]
- Copy the case 2 here
- Instead of doing the asynchronous memcopies and the kernel in the same loop as in Case 2, create a separate loop for each (3 loops in total)
- Make sure you record events in appropriate locations to get correct timings
- You can try setting
USE_PINNED_HOST_MEM
to0
at line#6
, to see how the timings change if we do not use pinned host memory.