Wait for processes to exit instead of STOP result #19

yvanoers · 2023-08-14T20:36:59Z

The current implementation waits for all results and stops waiting based on a "STOP" signal one for each spawned worker.
This causes the main loop to wait forever if any of the subprocesses gets killed, e.g. by the OOM killer, because the STOP signal isn't sent. We ran into this problem.

This change does away with the STOP marker. The processes themselves are waited upon and while they run the results are collected.

If a worker gets killed, the remaining work gets picked up by the other workers.
However, when all workers have exited, but any one of them was killed, an exception is raised to signal not all computations may have been done.
This behavior may need to change to either
-- only raise an exception if it is sure some work was not done, or
-- terminate the other workers early because an exception will be thrown when they're done anyway.

wouterbles · 2023-08-16T15:12:13Z

Thanks for your contribution! Personally, I think it would make the most sense to go for option 2, so terminate all workers when one is terminated. What do you think?

yvanoers · 2023-08-17T20:01:42Z

Ideally, if a worker gets killed, the other workers would finish the work and redo anything the killed worker was working on.
Such behavior requires some redesign though. Also, #14 may lead to changes in this regard nevertheless.

So I agree terminating the other workers early is the better option at this time.

yvanoers · 2023-08-26T20:07:41Z

@wouterbles
I added the implementation for option 2.

yvanoers added 3 commits August 14, 2023 22:01

Wait for processes to exit instead of STOP result

f036869

Add logging on process finish

dfe97ec

Raise on killing of any worker instead of all

1857e5d

Terminate workers early if one got killed

0cb924a

wouterbles merged commit b4675e1 into wouterbles:main Aug 29, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wait for processes to exit instead of STOP result #19

Wait for processes to exit instead of STOP result #19

yvanoers commented Aug 14, 2023

wouterbles commented Aug 16, 2023

yvanoers commented Aug 17, 2023

yvanoers commented Aug 26, 2023

Wait for processes to exit instead of STOP result #19

Wait for processes to exit instead of STOP result #19

Conversation

yvanoers commented Aug 14, 2023

wouterbles commented Aug 16, 2023

yvanoers commented Aug 17, 2023

yvanoers commented Aug 26, 2023