We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
From open-mpi/ompi#7217:
Details of the problem Calling MPI_Comm_split causes an immediate crash with the assertion: ../../../src/ib/ptl_ct.c:567: ct_check: Assertion `buf->type == BUF_TRIGGERED' failed. reduce: ../../../src/ib/ptl_ct.c:567: ct_check: Assertion `buf->type == BUF_TRIGGERED' failed. [bold-node013:14014] *** Process received signal *** [bold-node013:14014] Signal: Aborted (6) [bold-node013:14014] Signal code: (-6) [bold-node013:14014] [ 1] /home/pt2/openmpi-4.0.1/_build/../_install/lib/libopen-pal.so.40(opal_progress+0x2c)[0x7fbc4ba705ac] [bold-node013:14014] [ 2] /home/pt2/openmpi-4.0.1/_install/lib/libmpi.so.40(+0x3105d)[0x7fbc4c80205d] [bold-node013:14014] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x2e266)[0x7fb7c5f82266] [bold-node013:14014] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x2e312)[0x7fb7c5f82312] [bold-node013:14014] [ 5] /home/pt2/portals4/_build/../_install/lib/libportals.so.4(+0x84ab)> [0x7fb7b9f854ab] [bold-node013:14014] [ 6] /home/pt2/portals4/_build/../_install/lib/libportals.so.4(PtlCTFree+0xd7)[0x7fb7b9f84cbb] [bold-node013:14014] [ 7] /home/pt2/openmpi-4.0.1/_build/../_install/lib/openmpi/mca_coll_portals4.so(ompi_coll_portals4_iallreduce_intra_fini+0x15b)[0x7fb7b239c9db] [bold-node013:14014] [ 8] /home/pt2/openmpi-4.0.1/_build/../_install/lib/openmpi/mca_coll_portals4.so(+0x40c5)[0x7fb7b239d0c5] [bold-node013:14014] [ 9] /home/pt2/openmpi-4.0.1/_build/../_install/lib/libopen-pal.so.40(opal_progress+0x2c)[0x7fb7c57bb5ac] [bold-node013:14014] [10] /home/pt2/openmpi-4.0.1/_install/lib/libmpi.so.40(+0x3105d)[0x7fb7c654d05d] [bold-node013:14014] [11] /home/pt2/openmpi-4.0.1/_install/lib/libmpi.so.40(ompi_comm_nextcid+0x29)[0x7fb7c654ebc9] [bold-node013:14014] [12] /home/pt2/openmpi-4.0.1/_install/lib/libmpi.so.40(ompi_comm_split+0x3ea)[0x7fb7c654aaca] [bold-node013:14014] [13] /home/pt2/openmpi-4.0.1/_install/lib/libmpi.so.40(MPI_Comm_split+0xa8)[0x7fb7c65854d8] [bold-node013:14014] [14] ./reduce[0x40089c] [bold-node013:14014] [15] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fb7c5f75b45] [bold-node013:14014] [16] ./reduce[0x400769] [bold-node013:14014] *** End of error message *** Since the problem appears to be with MPI_Iallreduce, I tried running this sample program: #include <mpi.h> int main(int argc, char** argv) { MPI_Init(&argc, &argv); #if IALLREDUCE MPI_Request rq; int send, recv; MPI_Iallreduce(&send, &recv, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD, &rq); MPI_Wait(&rq, MPI_STATUS_IGNORE); #elif ALLREDUCE int send, recv; MPI_Allreduce(&send, &recv, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD); #else MPI_Comm out; MPI_Comm_split(MPI_COMM_WORLD, rank/2, rank, &out); #endif MPI_Finalize(); } MPI_Allreduce works properly, but MPI_Iallreduce and MPI_Comm_split fail. MPI_Iallreduce crashes with a similar stack trace as the one above.
Calling MPI_Comm_split causes an immediate crash with the assertion:
../../../src/ib/ptl_ct.c:567: ct_check: Assertion `buf->type == BUF_TRIGGERED' failed. reduce: ../../../src/ib/ptl_ct.c:567: ct_check: Assertion `buf->type == BUF_TRIGGERED' failed. [bold-node013:14014] *** Process received signal *** [bold-node013:14014] Signal: Aborted (6) [bold-node013:14014] Signal code: (-6) [bold-node013:14014] [ 1] /home/pt2/openmpi-4.0.1/_build/../_install/lib/libopen-pal.so.40(opal_progress+0x2c)[0x7fbc4ba705ac] [bold-node013:14014] [ 2] /home/pt2/openmpi-4.0.1/_install/lib/libmpi.so.40(+0x3105d)[0x7fbc4c80205d] [bold-node013:14014] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x2e266)[0x7fb7c5f82266] [bold-node013:14014] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x2e312)[0x7fb7c5f82312] [bold-node013:14014] [ 5] /home/pt2/portals4/_build/../_install/lib/libportals.so.4(+0x84ab)> [0x7fb7b9f854ab] [bold-node013:14014] [ 6] /home/pt2/portals4/_build/../_install/lib/libportals.so.4(PtlCTFree+0xd7)[0x7fb7b9f84cbb] [bold-node013:14014] [ 7] /home/pt2/openmpi-4.0.1/_build/../_install/lib/openmpi/mca_coll_portals4.so(ompi_coll_portals4_iallreduce_intra_fini+0x15b)[0x7fb7b239c9db] [bold-node013:14014] [ 8] /home/pt2/openmpi-4.0.1/_build/../_install/lib/openmpi/mca_coll_portals4.so(+0x40c5)[0x7fb7b239d0c5] [bold-node013:14014] [ 9] /home/pt2/openmpi-4.0.1/_build/../_install/lib/libopen-pal.so.40(opal_progress+0x2c)[0x7fb7c57bb5ac] [bold-node013:14014] [10] /home/pt2/openmpi-4.0.1/_install/lib/libmpi.so.40(+0x3105d)[0x7fb7c654d05d] [bold-node013:14014] [11] /home/pt2/openmpi-4.0.1/_install/lib/libmpi.so.40(ompi_comm_nextcid+0x29)[0x7fb7c654ebc9] [bold-node013:14014] [12] /home/pt2/openmpi-4.0.1/_install/lib/libmpi.so.40(ompi_comm_split+0x3ea)[0x7fb7c654aaca] [bold-node013:14014] [13] /home/pt2/openmpi-4.0.1/_install/lib/libmpi.so.40(MPI_Comm_split+0xa8)[0x7fb7c65854d8] [bold-node013:14014] [14] ./reduce[0x40089c] [bold-node013:14014] [15] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fb7c5f75b45] [bold-node013:14014] [16] ./reduce[0x400769] [bold-node013:14014] *** End of error message ***
Since the problem appears to be with MPI_Iallreduce, I tried running this sample program:
#include <mpi.h> int main(int argc, char** argv) { MPI_Init(&argc, &argv); #if IALLREDUCE MPI_Request rq; int send, recv; MPI_Iallreduce(&send, &recv, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD, &rq); MPI_Wait(&rq, MPI_STATUS_IGNORE); #elif ALLREDUCE int send, recv; MPI_Allreduce(&send, &recv, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD); #else MPI_Comm out; MPI_Comm_split(MPI_COMM_WORLD, rank/2, rank, &out); #endif MPI_Finalize(); }
MPI_Allreduce works properly, but MPI_Iallreduce and MPI_Comm_split fail. MPI_Iallreduce crashes with a similar stack trace as the one above.
The text was updated successfully, but these errors were encountered:
No branches or pull requests
From open-mpi/ompi#7217:
The text was updated successfully, but these errors were encountered: