Deploy a Dask.distributed cluster on top of a cluster running a SLURM workload manager.
Written under influence and with code borrowing from Dask-DRMAA project.
Launch cluster from Python code and do some simple calculation:
from slurmified import Cluster
slurm_kwargs = {
'mem-per-cpu': '100',
'time': '1-00:00:00'
}
cluster = Cluster(slurm_kwargs)
cluster.start_workers(10)
from distributed import Client
client = Client(cluster)
future = client.submit(lambda x: x + 1, 10)
future.result() # returns 11
If you want cluster to terminate automatically after calculation finished, you can use the following:
from slurmified import Cluster
from distributed import Client
slurm_kwargs = {
'mem-per-cpu': '100',
'time': '1-00:00:00'
}
inputs = list(range(0, 100))
with Cluster(slurm_kwargs).start_workers(10) as cluster:
with Client(cluster) as client:
incremented = client.map(lambda x: x+1, inputs)
inverted = client.map(lambda x: -x, incremented)
outputs = client.gather(inverted)
print(outputs) # prints [-1, .. , -100]
Quickly map function over arguments list:
from slurmified import map_interactive
slurm_kwargs = {
'mem-per-cpu': '100',
'time': '1-00:00:00'
}
inputs = list(range(0, 100))
map_func = map_interactive(10, slurm_kwargs=slurm_kwargs)
outputs = map_func(lambda x: x+1, inputs)
print(outputs) # prints [1, .. , 100]
Via pip:
pip install git+https://gitlab.com/slavoutich/slurmified.git