Do not synchronize in caching_allocator::{de}allocate#7792
Do not synchronize in caching_allocator::{de}allocate#7792miscco merged 2 commits intoNVIDIA:mainfrom
caching_allocator::{de}allocate#7792Conversation
b786fec to
81243c3
Compare
davebayer
left a comment
There was a problem hiding this comment.
I don't believe this is correct. We should .sync() in .deallocate(), because if you just want to enqueue the deallocation in the stream, but you aren't yet done with the buffers, next allocation could pick up this buffer and overwrite your data.
This comment has been minimized.
This comment has been minimized.
yeah you are right, Its not only the pointer, but it could be that the operation has not yet finished |
davebayer
left a comment
There was a problem hiding this comment.
What we could do is that we could add synchronization using cuda::event. In the .deallocate(...) method, we would recor an event that would be a replacement for cudaFreeAsync in the queue. Then in .allocate(...) method, we would enqueue wait for the event before returning the pointer, which would replace cudaMallocAsync in the queue.
This way we can be sure there are no race conditions.
ebeb540 to
dd0842c
Compare
| } | ||
| else | ||
| { | ||
| const cudaError_t status = cudaMallocAsync(&result, num_bytes, __stream.get()); |
There was a problem hiding this comment.
I am not sure if you can allocate memory with cudaMallocAsync and then free it with cudaFree
There was a problem hiding this comment.
You can: https://docs.nvidia.com/cuda/cuda-programming-guide/04-special-topics/stream-ordered-memory-allocation.html#freeing-memory
"Likewise, memory allocated with cudaMallocAsync can be freed with cudaFree()."
🥳 CI Workflow Results🟩 Finished in 1h 22m: Pass: 100%/33 | Total: 9h 00m | Max: 1h 05m | Hits: 85%/19433See results here. |
davebayer
left a comment
There was a problem hiding this comment.
But we should rework the allocator in the future
We should not need the additional synchronization.