Add async_local backend and allow using an existing dview for local and async_local backends #311
+389
−294
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a feature I added for myself, and I think it may be useful for others so I'm offering it here.
Problem: mesmerize is currently inflexible with regard to how the parallel processing is set up. Running an item always opens a new multiprocessing pool; you can control the number of processes through the MESMERIZE_N_PROCESSES environment variable, but that's it. I wanted to have the ability to a) pass in an existing cluster to multiple runs, to save overhead, and/or b) use a different type of cluster (e.g. an ipyparallel cluster spanning multiple nodes, or even a load-balanced view).
Solution: Passing a pool or dview into the run function clearly won't work with a subprocess, so the subprocess and slurm backends are out. The local backend calls the function directly, but it has the disadvantage that it blocks, so only one gridsearch run can be done at a time. However, we can get around that by spawning a thread (again, not a subprocess; the cluster objects can't be pickled).
I added a new backend called "local_async," which just launches a local run in a thread using the
concurrent.futures
module from the standard library. I also made it possible to pass a dview into both the local and local_async backends. I factored out some boilerplate from the 3 algorithm files into a_utils.py
file that launches a new multiprocessing pool if no dview was passed in (and closes it when finished), and otherwise just forwards what was passed in.Finally, I added a test that compares results from the 3 non-SLURM backends.
The diff is not really as big as Github is saying; there are a lot of whitespace changes, since I added a context manager in the 3 algorithms files, but checking the box to hide whitespace changes shows something more reasonable.