Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open MPI MPI_Comm_split crashes with Portals4 #82

Open
tkordenbrock opened this issue Dec 6, 2019 · 0 comments
Open

Open MPI MPI_Comm_split crashes with Portals4 #82

tkordenbrock opened this issue Dec 6, 2019 · 0 comments

Comments

@tkordenbrock
Copy link
Contributor

From open-mpi/ompi#7217:

Details of the problem

Calling MPI_Comm_split causes an immediate crash with the assertion:

../../../src/ib/ptl_ct.c:567: ct_check: Assertion `buf->type == BUF_TRIGGERED' failed.
reduce: ../../../src/ib/ptl_ct.c:567: ct_check: Assertion `buf->type == BUF_TRIGGERED' failed.
[bold-node013:14014] *** Process received signal ***
[bold-node013:14014] Signal: Aborted (6)
[bold-node013:14014] Signal code:  (-6)
[bold-node013:14014] [ 1] /home/pt2/openmpi-4.0.1/_build/../_install/lib/libopen-pal.so.40(opal_progress+0x2c)[0x7fbc4ba705ac]
[bold-node013:14014] [ 2] /home/pt2/openmpi-4.0.1/_install/lib/libmpi.so.40(+0x3105d)[0x7fbc4c80205d]
[bold-node013:14014] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x2e266)[0x7fb7c5f82266]
[bold-node013:14014] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x2e312)[0x7fb7c5f82312]
[bold-node013:14014] [ 5] /home/pt2/portals4/_build/../_install/lib/libportals.so.4(+0x84ab)> [0x7fb7b9f854ab]
[bold-node013:14014] [ 6] /home/pt2/portals4/_build/../_install/lib/libportals.so.4(PtlCTFree+0xd7)[0x7fb7b9f84cbb]
[bold-node013:14014] [ 7] /home/pt2/openmpi-4.0.1/_build/../_install/lib/openmpi/mca_coll_portals4.so(ompi_coll_portals4_iallreduce_intra_fini+0x15b)[0x7fb7b239c9db]
[bold-node013:14014] [ 8] /home/pt2/openmpi-4.0.1/_build/../_install/lib/openmpi/mca_coll_portals4.so(+0x40c5)[0x7fb7b239d0c5]
[bold-node013:14014] [ 9] /home/pt2/openmpi-4.0.1/_build/../_install/lib/libopen-pal.so.40(opal_progress+0x2c)[0x7fb7c57bb5ac]
[bold-node013:14014] [10] /home/pt2/openmpi-4.0.1/_install/lib/libmpi.so.40(+0x3105d)[0x7fb7c654d05d]
[bold-node013:14014] [11] /home/pt2/openmpi-4.0.1/_install/lib/libmpi.so.40(ompi_comm_nextcid+0x29)[0x7fb7c654ebc9]
[bold-node013:14014] [12] /home/pt2/openmpi-4.0.1/_install/lib/libmpi.so.40(ompi_comm_split+0x3ea)[0x7fb7c654aaca]
[bold-node013:14014] [13] /home/pt2/openmpi-4.0.1/_install/lib/libmpi.so.40(MPI_Comm_split+0xa8)[0x7fb7c65854d8]
[bold-node013:14014] [14] ./reduce[0x40089c]
[bold-node013:14014] [15] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fb7c5f75b45]
[bold-node013:14014] [16] ./reduce[0x400769]
[bold-node013:14014] *** End of error message ***

Since the problem appears to be with MPI_Iallreduce, I tried running this sample program:

#include <mpi.h>
int main(int argc, char** argv) {
        MPI_Init(&argc, &argv);
#if IALLREDUCE
        MPI_Request rq;
        int send, recv;
        MPI_Iallreduce(&send, &recv, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD, &rq);
        MPI_Wait(&rq, MPI_STATUS_IGNORE);
#elif ALLREDUCE
        int send, recv;
        MPI_Allreduce(&send, &recv, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
#else
        MPI_Comm out;
        MPI_Comm_split(MPI_COMM_WORLD, rank/2, rank, &out);
#endif
        MPI_Finalize();
}

MPI_Allreduce works properly, but MPI_Iallreduce and MPI_Comm_split fail.
MPI_Iallreduce crashes with a similar stack trace as the one above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant