Wait for processes to exit instead of STOP result #19
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The current implementation waits for all results and stops waiting based on a "STOP" signal one for each spawned worker.
This causes the main loop to wait forever if any of the subprocesses gets killed, e.g. by the OOM killer, because the STOP signal isn't sent. We ran into this problem.
This change does away with the STOP marker. The processes themselves are waited upon and while they run the results are collected.
If a worker gets killed, the remaining work gets picked up by the other workers.
However, when all workers have exited, but any one of them was killed, an exception is raised to signal not all computations may have been done.
This behavior may need to change to either
-- only raise an exception if it is sure some work was not done, or
-- terminate the other workers early because an exception will be thrown when they're done anyway.