Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Host IR: make stream synchronization non blocking #3608

Merged

Conversation

samnordmann
Copy link
Collaborator

@samnordmann samnordmann commented Dec 18, 2024

What

Make stream synchronization non-blocking from the CPU point of view

Why

Needed for achieving overlap in

before this patch:
Screenshot 2024-12-18 at 12 08 25
after this patch
Screenshot 2024-12-18 at 12 08 05

How

Before this patch, the host IR Synchronize would call c10::synchronize() on the cuda stream, which makes the CPU blocks until stream completion. With this patch, we synchronize the current stream with a given stream through a cudaEvent and the API cudaStreamWaitEvent.

@@ -196,6 +197,8 @@ Communicator::Communicator(
return;
}

NVFUSER_CUDA_RT_SAFE_CALL(cudaSetDevice(local_rank_));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been a long time I suspected this was going to be required at some point. Anyway, this is a recommended (if not required) practice. Without it, cudaEventRecord throws cudaErrorInvalidResourceHandle in a multi-GPU scenario.

@samnordmann samnordmann requested a review from nsarka December 18, 2024 11:38
@samnordmann
Copy link
Collaborator Author

!test

@samnordmann
Copy link
Collaborator Author

!test

@samnordmann
Copy link
Collaborator Author

!test

csrc/host_ir/executor.cpp Show resolved Hide resolved
csrc/host_ir/executor.cpp Show resolved Hide resolved
@samnordmann samnordmann merged commit cd2b3eb into NVIDIA:main Dec 23, 2024
47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants