part: multithreading deadlocks fixes and safety checks #12935

Keluaa · 2024-11-26T14:19:55Z

This PR fixes multiple deadlocks and issues encountered with partitioned communications.

The first deadlocks occur when one thread is in opal_progress and others are working on the partitioned request:

MPI_Pready could change the req->flags of a partition while the progress thread is testing it, leading to an edge case where req->done_count would be greater than the number of partitions
The changes done by MPI_Pready could be overwritten by the progress thread

Both were fixed by adding the array req->part_ready where MPI_Pready would mark the partitions ready to be sent.
This prevents the progress engine from touching the state of a partition as long as it isn't ready.
Since no atomic operations were added, this should have little to no impact on the performance.

I fixed another deadlock that I rarely encountered at the initialization of the part module: two ompi_comm_idup need to be done, both are started in the progress engine, and will prevent progressing partitioned requests until they are done.
Sometimes the second ompi_comm_idup would be marked as completed in one rank, but not in the other, leading to a deadlock.
This was fixed by doing one ompi_comm_idup at a time, with the side-effect of slowing down the initialization (the first request must be done before starting the next ompi_comm_idup).

A very rare segfault caused by calling mca_part_persist_free_req while ompi_part_persist.lock was unlocked was also fixed.

I added many error checks to mca_part_persist_progress, ensuring no new deadlock can occur when an internal function fails.

For testing, I used a two-way ring exchange with a fixed number of partitions distributed among multiple OpenMP threads. This was my original use-case for which I encountered all those issues since all threads need to call MPI_Pready and MPI_Parrived.
Using up to 128 cores with varying amounts of processes and threads showed no new deadlocks or communication issues.

…ng the locked section. Signed-off-by: Keluaa <[email protected]>

…r of completed tasks. Signed-off-by: Keluaa <[email protected]>

Signed-off-by: Keluaa <[email protected]>

Partition flags weren't thread safe, as they could be modified from `MPI_Pready` by another thread while the progress engine was running in another. The new `part_ready` array prevents the progress engine from using flags which may modified in `MPI_Pready`, and vice versa. This fix only adds one array, and doesn't add any atomic operations: performance shouldn't be impacted. The `done_count` value was moved outside `mca_part_persist_request_t` to fully prevent the edge-case were it would go above the number of partitions in the request. Signed-off-by: Keluaa <[email protected]>

Deadlocks were observed when doing multiple `ompi_comm_idup` at once during initialization when multiple threads could call `opal_progress`. This may be due to a request ordering mismatch, or some other error: the precise cause was not identified. Doing the `ompi_comm_idup` one at a time avoids the issue. Signed-off-by: Keluaa <[email protected]>

The previous usage of `MPI_UNDEFINED` would cause greater issues downstream otherwise. It also removes some compiler warnings with GCC 11. Signed-off-by: Keluaa <[email protected]>

github-actions bot added the Target: main label Nov 26, 2024

Keluaa added 6 commits November 28, 2024 08:48

Do not test an inactive partition request. Free requests before exiti…

0e4ae43

…ng the locked section. Signed-off-by: Keluaa <[email protected]>

Handle all errors in mca_part_persist_progress and return the numbe…

35ef2b2

…r of completed tasks. Signed-off-by: Keluaa <[email protected]>

Fix some typos and trailing spaces

efdb534

Signed-off-by: Keluaa <[email protected]>

Throw errors when the datatype has an invalid size

4c6c164

The previous usage of `MPI_UNDEFINED` would cause greater issues downstream otherwise. It also removes some compiler warnings with GCC 11. Signed-off-by: Keluaa <[email protected]>

Keluaa force-pushed the part_fixes branch from 6d1f0f4 to 4c6c164 Compare November 28, 2024 07:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

part: multithreading deadlocks fixes and safety checks #12935

part: multithreading deadlocks fixes and safety checks #12935

Keluaa commented Nov 26, 2024

part: multithreading deadlocks fixes and safety checks #12935

Are you sure you want to change the base?

part: multithreading deadlocks fixes and safety checks #12935

Conversation

Keluaa commented Nov 26, 2024