-
Notifications
You must be signed in to change notification settings - Fork 420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to change single copy VIA xpmem execution to the sender process #10019
Comments
Currently rkey_ptr protocol always does memcpy on the receiver. In order to do memcpy on the sender would need to implement a new variant of this protocol (with extra control message) |
@arun-chandran-edarath, in case you would want more details, without much thinking and unsure about perf result, it might be possible to to implement as PoC either at:
|
Thank you for your responses. I would like to clarify if the two suggestions provided are identical: a) Implementing a new variant of the rkey_ptr protocol (with an extra control message) to perform memcpy on the sender. Could you please provide more specific details or elaborate on these suggestions? Additionally, it would be helpful if you could point me towards the relevant source code files or any examples that I could refer to. --Arun |
Hi Everyone,
@yosefe @tvegas1
I am currently examining the execution of MPI_Send (Blocking send) with UCX in an intra_node scenario. At present, the memory transfer (ucs_memcpy_relaxed()) is executed in the receiver process (rank or processor), as depicted below.
By executing the same in the sender process, as shown below, we could significantly reduce cache-to-cache data transfers and conserve memory bandwidth.
However, I am struggling to find a runtime configuration that would allow me to execute this transfer in the sender process with the hint UCS_ARCH_MEMCPY_NT_DEST and benchmark it. Could anyone provide some guidance or suggestions on this matter?
Thank you in advance for your assistance.
--Arun
The text was updated successfully, but these errors were encountered: