-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds distributed row gatherer #1589
base: neighborhood-communicator
Are you sure you want to change the base?
Conversation
6b4521b
to
ae60198
Compare
6acf7c4
to
8aa6ab9
Compare
49557f1
to
4a79442
Compare
8aa6ab9
to
77398bd
Compare
4a79442
to
172eb7d
Compare
77398bd
to
d278cad
Compare
98fa10a
to
79de4c3
Compare
One issue that I have is the constructor. It takes a
If I can't come up with anything better, I guess I will use that. |
79de4c3
to
b0e5c92
Compare
d278cad
to
d6112ef
Compare
b0e5c92
to
775854a
Compare
d6112ef
to
1582673
Compare
Do we need to have the |
5e970e9
to
0a8e28c
Compare
fe864bb
to
db7f6ed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice work! LGTM!
int is_inactive; | ||
MPI_Status status; | ||
GKO_ASSERT_NO_MPI_ERRORS( | ||
MPI_Request_get_status(req_listener_, &is_inactive, &status)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we maybe move this MPI function into mpi.hpp
and create a wrapper for it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't really work here, since this function would be a member function of request
, but I'm using a bare MPI_Request
(and can't use request
, because it will try to free the request in the destructor), so it would not be applicable.
|
||
mutable array<char> send_workspace_; | ||
|
||
mutable MPI_Request req_listener_{MPI_REQUEST_NULL}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be of type mpi::request
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because the destructor of mpi::request
will try to free the request. But req_listenser_
doesn't own any requests, so the program would crash.
template <typename LocalIndexType> | ||
void RowGatherer<LocalIndexType>::apply_impl(const LinOp* alpha, const LinOp* b, | ||
const LinOp* beta, LinOp* x) const | ||
GKO_NOT_IMPLEMENTED; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can also implement the advanced apply by replacing b_local->row_gather(idxs, buffer)
by b_local->row_gather(alpha, idxs, beta, buffer)
?
0a8e28c
to
bfc5233
Compare
db7f6ed
to
0ad4ee8
Compare
bfc5233
to
8697971
Compare
0ad4ee8
to
1f49b91
Compare
8697971
to
341e781
Compare
1f49b91
to
4db050c
Compare
send_sizes.data(), send_offsets.data(), type, recv_ptr, | ||
recv_sizes.data(), recv_offsets.data(), type); | ||
coll_comm | ||
->i_all_to_all_v(use_host_buffer ? exec->get_master() : exec, send_ptr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any difference between using all_to_all_v vs i_all_to_all_v? I assume all_to_all_v also update the interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all_to_all_v
is a blocking call, while i_all_to_all_v
is non-blocking. Right now the collective_communicator only provides the non-blocking interface, since it is more general.
* auto x = matrix::Dense<double>::create(...); | ||
* | ||
* auto future = rg->apply_async(b, x); | ||
* // do some computation that doesn't modify b, or access x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it access x but it is unclear when it will be accessed before the wait
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this just meant to say that you can't expect any meaningful data when accessing x
before the wait
has completed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I get it wrong.
Is the comment here to describe that user can do something safely after the call or the apply_async behavior?
My comment was based on that it is the behavior of the apply_async because apply_async definitely accesses x.
If it is for user action during async and wait, then it is correct.
workspace.set_executor(mpi_exec); | ||
if (send_size_in_bytes > workspace.get_size()) { | ||
workspace.resize_and_reset(sizeof(ValueType) * | ||
send_size[0] * send_size[1]); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
combining them to assign the workspace directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Combine how? Do you mean like
workspace = array<char>(mpi_exec, sizeof(ValueType) * send_size[0] * send_size[1]);
req = coll_comm_->i_all_to_all_v( | ||
mpi_exec, send_ptr, type.get(), recv_ptr, type.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
send_buffer might be on the host but the recv_ptr(x_local) might be on the device
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a check above to ensure that the memory space of the recv buffer is accessible from the mpi executor. So if GPU aware MPI is used, it should work (even if send buffer is on the host and recv buffer in the device or vice versa). Otherwise an exception will be thrown.
4db050c
to
1ebe59f
Compare
b2025a8
to
f77cb6c
Compare
f77cb6c
to
c827b23
Compare
1ebe59f
to
e7d32a1
Compare
- only allocate if necessary - synchronize correct executor Co-authored-by: Pratik Nayak <[email protected]>
- split tests into core and backend part - fix formatting - fix openmpi pre 4.1.x macro Co-authored-by: Pratik Nayak <[email protected]> Co-authored-by: Yu-Hsiang M. Tsai <[email protected]>
c827b23
to
2a54c3e
Compare
This PR adds a distributed row gatherer. This operator essentially provides the communication required in our matrix apply.
Besides the normal apply (which is blocking), it also provides two asynchronous calls. One version has an additional
workspace
parameter which is used as send buffer. This version can be called multiple times without restrictions, if different workspaces are used for each call. The other version doesn't have a workspace parameter, and instead uses an internal buffer. As a consequence, this function can only be called a second time, if the request of the previous call has been waited on. Otherwise, this function will throw.This is the second part of splitting up #1546.
It also introduces some intermediate changes, which could be extracted out beforehand:
a type-erasedDenseCache
makingnow part of Use index_map in distributed::matrix #1544detail::run
easier to usePR Stack: