-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unordered map creation freezes async processes #350
Comments
runBuldKernel << < block_size_x, thread_size_x, 0, build_stream >> > (ng, object_size_ui);
printf("1\n");
//stdgpu::unordered_map<uint32_t, uint32_t> map = stdgpu::unordered_map<uint32_t, uint32_t>::createDeviceObject(8);
Pointer* p;
cudaMalloc(&p, 1 * sizeof(Pointer));
printf("2\n") This in contrast works in async. Allocation happens without waiting for |
This is a known limitation. Although the required parallel algorithms from thrust used in stdgpu as well as the intermediate interface in stdgpu all support arbitrary I think adding explicit support for asynchronous streams would be a good enhancement. Until this feature lands in stdgpu, as a workaround you could possibly 1. move the creation of the map to an earlier stage if this is possible, or 2. enable "per-thread" behavior for the default stream which can be set with the |
1 is not possible. And I am not sure what 2 does, need to read about it, so it doesn't brake something else. |
For reference, #351 tracks all affected functions which currently do not have proper support for custom |
@stotko doesn't seem like default stream is the issue. This below works in async.. runBuldKernel << < block_size_x, thread_size_x >> > (ng, object_size_ui);
printf("1\n");
//stdgpu::unordered_map<uint32_t, uint32_t> map = stdgpu::unordered_map<uint32_t, uint32_t>::createDeviceObject(8);
Pointer* p;
cudaMalloc(&p, 1 * sizeof(Pointer));
printf("2\n");
When I uncomment the map part, its blocked. No matter what comes after it. |
Thanks for further testing. I still believe that the issue is related to the default stream. Just to make sure, could be try calling another kernel on the default stream (could be anything), while In contrast to a pure |
runBuldKernel << < block_size_x, thread_size_x, 0, build_stream >> > (ng, object_size_ui);
printf("1\n");
k_2 << <1, 1 >> > ();
printf("2\n");
stdgpu::unordered_map<uint32_t, uint32_t> map = stdgpu::unordered_map<uint32_t, uint32_t>::createDeviceObject(8);
printf("3\n"); k_2 is executed without waiting. An then it blocks in map creation, i.e. 2 is printed |
I have reproduced your observations. In fact, there are two issues:
In that sense, you are right that my initial explanation was not sufficient. Fortunately, adding support for custom |
So there is currently no solution to make this happen in async? |
If you are only concerned about the CPU blocking part and the stream ordering behavior is acceptable, then a workaround could be to create the |
Describe the bug
unordered map creation freezes async processes
Steps to reproduce
Expected behavior
The map creation and memory allocation should complete right away, without waiting for runBuldKernel to complete
Actual behavior
The map creation and memory allocation completes only after runBuldKernel is done
System (please complete the following information):
The text was updated successfully, but these errors were encountered: