-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC - Expose send params driver bypass #75
base: expose_send_params
Are you sure you want to change the base?
RFC - Expose send params driver bypass #75
Conversation
@@ -39,6 +39,8 @@ | |||
( ((((v) & 0xffff0000U) >> 16) == GDS_API_MAJOR_VERSION) && \ | |||
((((v) & 0x0000ffffU) >> 0 ) >= GDS_API_MINOR_VERSION) ) | |||
|
|||
#define IBV_EXP_SEND_GET_INFO (1 << 28) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the code still keep this flag in send_flags? I think it would be better to use a separate field so that future send flags in libibverbs won't conflict with this definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the problem here is that gds_send_wr is simply ibv_exp_send_wr.
maybe we want to have a new flags arg for gds_prepare_send().
@@ -159,8 +166,23 @@ typedef enum gds_update_send_info_type { | |||
* Represents a posted send operation on a particular QP | |||
*/ | |||
|
|||
#define GDS_SEND_MAX_SGE 16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to enforce this limitation? Maybe check it in gds_create_qp
?
* Notes: | ||
* - TODO. | ||
*/ | ||
int gds_report_post(struct gds_qp *gqp /*, struct gds_send_wr* wr*/); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function documentation is still missing. I guess this function advances the tracking in the gds_qp struct of the current producer index with the given send wr size?
{ | ||
gds_dbg("[%s] wr_id=%lx, num_sge=%d\n", func_name, swr_info.wr_id, swr_info.num_sge); | ||
|
||
for(int j=0; j < swr_info.num_sge; j++) | ||
{ | ||
gds_dbg("[%s] SGE=%d, Size ptr=0x%08x, Size=%d (0x%08x), +offset=%d\n", | ||
gds_dbg("[%s] SGE=%d, Size ptr=00x%lx, Size=%d (0x%08x), +offset=%d\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a typo here (00x), and you might want to put debugging print changes in a separate patch, to make the review easier.
gds_info->sge_list[i].ptr_to_size = (uintptr_t) &(sge->byte_count); | ||
gds_info->sge_list[i].ptr_to_lkey = (uintptr_t) &(sge->key); | ||
gds_info->sge_list[i].ptr_to_addr = (uintptr_t) &(sge->addr); | ||
gds_info->sge_list[i].offset = 0; //why is that here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does the offset field stand for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@e-ago do you remember why you had offset in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did you find this? Considering file https://github.com/gpudirect/libmlx5/blob/expose_send_params/src/qp.c the offset is modified here
qp->swr_info[qp->cur_swr].sge[qp->swr_info[qp->cur_swr].cur_sge].offset = offset;
and here
swr_info->sge_list[j].offset = qp->swr_info[i].sge[j].offset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the offset field affecting the data written into the wqe in some way?
Co-Authored-By: yanivbl6 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a general preexisting remark, using mlx5dv is narrowing the scope of the whole library both to a single vendor and family of adapters.
Either in this change or in a later change in the expose_send_params branch, I think we need to:
- detect mlx5dv and define a macro
- implement the new APIs if the macro is defined, and return a clean error otherwise.
Makefile.am
Outdated
|
||
endif | ||
|
||
SUFFIXES= .cu | ||
|
||
.cu.o: | ||
$(NVCC) $(CPPFLAGS) $(AM_CPPFLAGS) $(NVCCFLAGS) $(GENCODE_FLAGS) -c -o $@ $< | ||
$(NVCC) $(CPPFLAGS) $(AM_LDFLAGS) $(AM_CPPFLAGS) $(NVCCFLAGS) $(GENCODE_FLAGS) -c -o $@ $< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-lmlx5 should not be needed here
@@ -39,6 +39,8 @@ | |||
( ((((v) & 0xffff0000U) >> 16) == GDS_API_MAJOR_VERSION) && \ | |||
((((v) & 0x0000ffffU) >> 0 ) >= GDS_API_MINOR_VERSION) ) | |||
|
|||
#define IBV_EXP_SEND_GET_INFO (1 << 28) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the problem here is that gds_send_wr is simply ibv_exp_send_wr.
maybe we want to have a new flags arg for gds_prepare_send().
gds_info->sge_list[i].ptr_to_size = (uintptr_t) &(sge->byte_count); | ||
gds_info->sge_list[i].ptr_to_lkey = (uintptr_t) &(sge->key); | ||
gds_info->sge_list[i].ptr_to_addr = (uintptr_t) &(sge->addr); | ||
gds_info->sge_list[i].offset = 0; //why is that here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@e-ago do you remember why you had offset in the first place?
@yanivbl6 May I ask you to run again the |
I have reproduced the error, and will be looking into it. Edit: |
I fixed a critical bug (using wrong structure for send wqe), but I still get a validity error on the next iterations. |
I ran again the tests. When running
while I there is no error when running |
I experienced errors with GPU-Memory as well. |
I've managed to pass the tests successfully by adding a MPI_Barrier(MPI_COMM_WORLD) in the validate section. I guess there is some race condition but not sure where it is coming from. |
Which version of CUDA and NVIDIA driver are you using? |
I was using CUDA9.0. I think the driver was 384.90, but not sure. I don't have access to GPU servers at the moment, I will try it when it is available again. I tried understanding why the MPI-Barrier may help but it coudln't. I didn't check the kernel though- is there a way to the dump the cqes polled the GPU? |
I deleted the previous comment. I noticed two errors, one related to the original
The validation error may be related to the fact that all the receive requests posted by As a reminder, I've tested everything using:
|
This code should be compared with the upstream/expose_send_params branch
The goal of those commits is to allow the changes in the "Expose send params" branch to work without the required changes to the mlx5 driver. This is done by using direct verbs to extract driver info.
The method to bypass the driver is:
I haven't yet managed to run all the tests in gda successfully (with or without this patch), but I was able to run:
Running gds_kernel_latency, peersync, descriptors, RC
Running gds_kernel_latency, peersync, descriptors, GMEM buffers, RC
without errors/hangs, after changing the hard coded number of batches in the test to 1.
@haggaie @bureddy @drossetti @e-ago