Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

halo.update_async(); Deadlock in Multi Node Case on feat-halo #686

Open
pauleonix opened this issue Feb 1, 2020 · 2 comments
Open

halo.update_async(); Deadlock in Multi Node Case on feat-halo #686

pauleonix opened this issue Feb 1, 2020 · 2 comments
Assignees
Labels

Comments

@pauleonix
Copy link

pauleonix commented Feb 1, 2020

In my stencil benchmark I use the halo wrapper as in ex.02.matrix.halo.heat_equation. When building with the feat-halo branch my code deadlocks at halo.update_async(); when running the code on more than one node.
If I use the developement branch or run on a single node with more than one dash unit, this doesn't happen.

This also seems to be the reason for dash-test-mpi not being able to finish in 8 hours on multiple nodes in issue #682. If one looks at the end of the test output posted in that issue, one can observe that it was at mHaloTest.HaloMatrixWrapperNonCyclic2D when the test was cancelled due to a time limit.

@pauleonix pauleonix changed the title halo.update_async(); Deadlock in Multi Node Case halo.update_async(); Deadlock in Multi Node Case on feat-halo Feb 1, 2020
@devreal devreal added the bug label Feb 1, 2020
@dhinf
Copy link
Member

dhinf commented Feb 7, 2020

I will look into next week.

@dhinf
Copy link
Member

dhinf commented Feb 18, 2020

@Spielix can you please name the flags you used. Did you enabled DYNAMIC_WINDOWS or SHARED_WINDOWS. I can't reproduce the error. The only thing happened to me, was a MPI_Win_detach error. The error is located in OpenMPI. We have a workaround provided by @devreal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants