You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In my stencil benchmark I use the halo wrapper as in ex.02.matrix.halo.heat_equation. When building with the feat-halo branch my code deadlocks at halo.update_async(); when running the code on more than one node.
If I use the developement branch or run on a single node with more than one dash unit, this doesn't happen.
This also seems to be the reason for dash-test-mpi not being able to finish in 8 hours on multiple nodes in issue #682. If one looks at the end of the test output posted in that issue, one can observe that it was at mHaloTest.HaloMatrixWrapperNonCyclic2D when the test was cancelled due to a time limit.
The text was updated successfully, but these errors were encountered:
pauleonix
changed the title
halo.update_async(); Deadlock in Multi Node Case
halo.update_async(); Deadlock in Multi Node Case on feat-halo
Feb 1, 2020
@Spielix can you please name the flags you used. Did you enabled DYNAMIC_WINDOWS or SHARED_WINDOWS. I can't reproduce the error. The only thing happened to me, was a MPI_Win_detach error. The error is located in OpenMPI. We have a workaround provided by @devreal
In my stencil benchmark I use the halo wrapper as in
ex.02.matrix.halo.heat_equation
. When building with thefeat-halo
branch my code deadlocks athalo.update_async();
when running the code on more than one node.If I use the
developement
branch or run on a single node with more than one dash unit, this doesn't happen.This also seems to be the reason for
dash-test-mpi
not being able to finish in 8 hours on multiple nodes in issue #682. If one looks at the end of the test output posted in that issue, one can observe that it was atmHaloTest.HaloMatrixWrapperNonCyclic2D
when the test was cancelled due to a time limit.The text was updated successfully, but these errors were encountered: