This is a minimal example of a targets
workflow that can run on HiperGator HPC using the clustermq
backend for paralellization. This is intended to be run on Hipergator through a command-line interface. For an alternative workflow that uses Hipergator but is run in a local RStudio session, see this repo.
To run this example workflow, SSH into HiperGator and navigate to your blue
or orange
directory.
Then clone this repository by running:
git clone https://github.com/BrunaLab/hipergator-targets.git
Edit submit_pipeline.sbatch
and slrum_clustermq.tmpl
in a text editor to include your email address and any other SLURM settings you wish. Don't touch the wildcards in double curly braces in slurm_clustermq.tmpl
Then, start the workflow by running sbatch submit_pipeline.sbatch
.
This will start a job on HiperGator that will in turn spawn other jobs using the clustermq
package and the slurm_clustermq.tmpl
file.
The project is set up to use renv
to manage a package library so as long as the lockfile is up-to-date, all necessary packages should be installed.
submit_pipeline.sbatch
tells SLURM to load R and then runpipeline.R
---that's it. This starts a top-level job running on HiperGator.pipeline.R
installs dependencies withrenv::restore()
, then runs thetargets
workflow withtar_make_clustermq()
.- Using
tar_make_clustermq()
spawns worker jobs on HiperGator using theslurm_clustermq.tmpl
file as a template for the SLURM submission scripts for each worker.
The parallelization happens at the level of targets.
In this example, a list of numeric vectors is stored as many_vects
.
Then, independently, means and standard deviations are calculated for each vector in the list.
These two targets (the means and the sd's) should be able to run on separate workers in parallel if things are set up correctly.
Parallelizing code within a target (e.g. a function that does parallel computation) will require more setup.
You may also want to test that this workflow runs locally on your computer.
You can clone this repository locally and run targets::tar_make()
in the console to run it.
See documentation for the targets
package for more information.
Shouts to @diazrenata for taking the time to help me figure this out.