From 4519fc26ce08e4899148811854cee17fde60550a Mon Sep 17 00:00:00 2001 From: Laxmana Yetukuri Date: Fri, 20 Dec 2024 08:51:50 +0200 Subject: [PATCH 01/29] nextflow docs update --- docs/apps/nextflow.md | 9 +- docs/support/tutorials/nextflow-puhti.md | 209 +++++++++++++++++------ 2 files changed, 164 insertions(+), 54 deletions(-) diff --git a/docs/apps/nextflow.md b/docs/apps/nextflow.md index 73994fd1df..6a31e31e5a 100644 --- a/docs/apps/nextflow.md +++ b/docs/apps/nextflow.md @@ -6,7 +6,8 @@ tags: # Nextflow Nextflow is a scientific workflow management system for creating scalable, -portable, and reproducible workflows. +portable, and reproducible workflows. It is a groovy-based language for expressing the entire workflow in a single script and also supports running scripts (via script/run/shell directive of Snakemake rule) from other languages such as R, bash and Python. + [TOC] @@ -14,8 +15,8 @@ portable, and reproducible workflows. Versions available on CSC's servers -* Puhti: 21.10.6, 22.04.5, 22.10.1, 23.04.3 -* Mahti: 22.05.0-edge +* Puhti: 21.10.6, 22.04.5, 22.10.1, 23.04.3, 24.01.0-edge.5903, 24.10.0 +* Mahti: 22.05.0-edge, 24.04.4 * LUMI: 22.10.4 !!! info "Pay attention to usage of Nextflow version" @@ -69,7 +70,7 @@ computational workflows. Nat. Biotechnol. 35, 316–319 (2017). ## More information -* [Nextflow documentation](https://www.nextflow.io/docs/latest/index.html) +* [Nextflow official documentation](https://www.nextflow.io/docs/latest/index.html) * [Running Nextflow on Puhti](../support/tutorials/nextflow-puhti.md) * [High-throughput Nextflow workflow using HyperQueue](../support/tutorials/nextflow-hq.md) * [Contact CSC Service Desk for technical support](../support/contact.md) diff --git a/docs/support/tutorials/nextflow-puhti.md b/docs/support/tutorials/nextflow-puhti.md index 847f5c56fe..afb0d5bdec 100755 --- a/docs/support/tutorials/nextflow-puhti.md +++ b/docs/support/tutorials/nextflow-puhti.md @@ -1,28 +1,41 @@ # Running Nextflow pipelines on Puhti -[Nextflow](https://www.nextflow.io/) is a scalable and reproducible scientific -workflow management system that interacts with containerized applications to -perform compute-intensive tasks. Nextflow provides built-in support for +[Nextflow](https://www.nextflow.io/) is one of scientific wokrflow managers written in groovy. Nextflow provides built-in support for HPC-friendly containers such as Apptainer and Singularity. Although Nextflow pipelines allow us to choose Docker engine as an executor for running pipelines, please note that Docker containers can't be used on Puhti due to the lack of administrative privileges for regular users. -## Strengths of Nextflow +Please refer to [High-throughput computing and workflows page](../../computing/running/throughput.md) to get an overview of different tools and help you choose the right tool. -* Easy installation -* Supports implicit parallelism -* Can handle complex dependencies and conditional execution -* Handles error recovery +## Installation -## Disadvantages of Nextflow +The installation of Nextflow is easy as the nextflow is java-based Nextflow is available as a module in Puhti supercomputer. -* Has limited MPI support -* Creates a lot of job steps and excessive I/O load -* Does not efficiently integrate with Slurm scheduler -## Use Apptainer/Singularity containers with Nextflow +## Nextflow module +Nextflow is available as a module on Puhti. There multiple versions of +nextflow are available. One can choose the version of depending on the requirement of a pipelines.Please note that the nextflow version starting from 23.04.3 can only be +used for pipelines built with DSL2. You can downgrade to lower versions for DSL1-compliant pipelines. + +Nextflow can be loaded for example as +below: + +```bash +module load nextflow/22.10.1 +``` + +!!! note + Please make sure to specify the correct version of the Nextflow module as + some pipelines require a specific version of Nextflow. + + +### Installation of tools used in the the workflow + +1. By default, nextflow expects that tools are installed locally. Tools available in other [Puhti modules](../../apps/by_discipline.md) or [own custom module](../../computing/modules.md#using-your-own-module-files). + +2. Own custom installations as Apptainer containers: Containers can be smoothly integrated with Nextflow pipelines. No additional modifications to Nextflow scripts are needed except enabling the Singularity/Apptainer engine (instead of Docker) in the Nextflow configuration @@ -33,26 +46,6 @@ prefixing the image name with `shub://` or `docker://`. It is also possible to specify a different Singularity image for each process definition in the Nextflow pipeline script. -Here is a generic recipe for running a Nextflow pipeline on Puhti: - -* [1. Login to Puhti supercomputer](#1-login-to-puhti-supercomputer) -* [2. Prepare your Apptainer/Singularity images](#2-prepare-your-apptainersingularity-images) -* [3. Load Nextflow module on Puhti](#3-load-nextflow-module-on-puhti) -* [4. Set-up your Nextflow pipeline environment](#4-set-up-your-nextflow-pipeline-environment) -* [5. Run your Nextflow pipeline as a batch job](#5-run-your-nextflow-pipeline-as-a-batch-job) -* [6. Demonstration of nf-core Nextflow pipeline using HyperQueue executor (optional)](#6-demonstration-of-nf-core-nextflow-pipeline-using-hyperqueue-executor-optional) - -## 1. Login to Puhti supercomputer - -SSH to the login node of Puhti supercomputer -([more instructions here](../../computing/index.md#connecting-to-the-supercomputers)). - -```bash -ssh @puhti.csc.fi # replace with your CSC username -``` - -## 2. Prepare your Apptainer/Singularity images - Most Nextflow pipelines pull the needed container images on the fly. However, when there are multiple images involved, it is a good idea to prepare the images locally first before launching your Nextflow pipeline. @@ -71,21 +64,9 @@ More information on these different options can be found in our !!! note Singularity/Apptainer is installed on login and compute nodes and does not require loading a separate module on either Puhti, Mahti or LUMI. + * Apptainer container can be downloaded from some repository or built locally. For building custom Apptainer containers, see [Creating containers page](../../computing/containers/creating.md). -## 3. Load Nextflow module on Puhti - -Nextflow is available as a module on Puhti and can be loaded for example as -below: - -```bash -module load nextflow/22.10.1 -``` - -!!! note - Please make sure to specify the correct version of the Nextflow module as - some pipelines require a specific version of Nextflow. - -## 4. Set-up your Nextflow pipeline environment + * For binding folders or using other Apptainer flags, use [--apptainer-args option] or delcare in the nextflow.config files. Running Nextflow pipelines can sometimes be quite compute-intensive and may require downloading large volumes of data such as databases and container @@ -104,7 +85,78 @@ pipeline: * Clone the GitHub repository of your pipeline to your scratch directory and then [run your pipeline](#5-run-your-nextflow-pipeline-as-a-batch-job). -## 5. Run your Nextflow pipeline as a batch job + +## Usage +Snakemake can be run in 4 different ways in supercomputers: + +1. [In interactive mode](../../computing/running/interactive-usage.md) with local executor, with limited resources. Useful mainly for debugging or very small workflows. +2. With batch job and local executor. Resource usage limited to one full node. Useful for small and medium size workflows, simpler than next options, start with this, if unsure. +3. With batch job and SLURM executor. Can use multiple nodes and different SLURM partitions (CPU and GPU), but may create significant overhead, if many small jobs. Could be used, if each job step for each file takes at least 30 min. +4. With batch job and HyperQueue as a sub-job scheduler. Can use multiple nodes in the same batch job allocation, most complex set up. Suits well for cases, when workflow includes a lot of small job steps with many input files (high-troughput computing). + +!!! info "Note" + Please do not launch heavy Snakemake workflows on **login nodes**. + +### Running Snakemake workflow with local executor interactively +Lanuch an [interactive session](https://docs.csc.fi/computing/running/interactive-usage/) on Puhti as below: +``` +sinteractive -c 2 -m 4G -d 250 -A project_2xxxx # replace actual project number here +module load nextflow/23.04.3 # Load nextflow module +``` +‼️ Please note that one has to load a module (in this case nextflow) with a version. Otherwise, the latest version of stable module installed at that point is used. For the reproducibility point of view, make sure to load versions of all tools including the nextflow module. + +## Tutorial 1: Hello-world example + +This "Hello-world" minimalist example demonstrates the basic syntax of Nextflow for a process. In this tutorial, you will learn how to run a Nextflow script as well as understand the default location of resulting output files. Copy the below script to a file named, hello-world.nf + +```nextflow +#!/usr/bin/env nextflow + +greets = Channel.fromList(["Moi", "Ciao", "Hello", "Hola","Bonjour"]) + +/* + * Use echo to print 'Hello !' in different languages to a file + */ + +process sayHello { + + input: + val greet + + output: + path "${greet}.txt" + + script: + """ + echo ${greet} > ${greet}.txt + """ +} + +workflow { + + // Print a greeting + sayHello(greets) +} + +``` + + Execute the script by entering the following command on your interactive Puhti terminal: + +```nextflow +nextflow run hello-world.nf +``` +This script defines one process named `sayHello`. This process takes a set of greetings from different languages and then writes each one to a separate file. + +The resulting terminal output would look similar to the text shown below: + +```nextflow +N E X T F L O W ~ version 23.04.3 +Launching `hello-world.nf` [intergalactic_panini] DSL2 - revision: 880a4a2dfd +executor > local (5) +[a0/bdf83f] process > sayHello (5) [100%] 5 of 5 ✔ +``` + +### Running Snakemake workflow with local executor and batch job Please follow our [instructions for writing a batch job script for Puhti](../../computing/running/example-job-scripts-puhti.md). @@ -124,10 +176,11 @@ pipeline on Puhti: #SBATCH --mem-per-cpu=1G # Increase as needed # Load Nextflow module -module load nextflow/22.10.1 +module load nextflow/23.04.3 # Actual Nextflow command here nextflow run workflow.nf +# nf-core pipeline example: nextflow run nf-core/scrnaseq -profile test,singularity -resume --outdir . ``` !!! note @@ -148,7 +201,63 @@ nextflow run workflow.nf adding `#SBATCH --gres=nvme:`. For example, add `#SBATCH --gres=nvme:100` to request 100 GB of space on `$LOCAL_SCRATCH`. -## 6. Demonstration of nf-core Nextflow pipeline using HyperQueue executor (optional) + +Finally, submit your job as below: + +``` +sbatch scrna_nfcore.sh +``` + +Monitor the status of submitted Slurm job + +``` + squeue -j + # or + squeue --me + # or + squeue -u $USER +``` + +### Running Snakemake workflow with SLURM executor (Currently NOT recommended on Puhti but good to know to realise the power of nextflow) + +One of the advantages of nextflow is that the actual pipeline functional logic is separated from the execution environment. The same script can therefore be executed in different environment by changing the execution environment without touching actual pipeline code. Nextflow uses `executor` information to decide where the job should be run. Once executor is configured, Nextflow submits each process to the specified job scheduler on your behalf (=you don't need to write sbatch script, nextflow writes on the fly for you, instead). + +Default executor is `local` where process is run in your computer/localhost where Nextflow is launched. Other executors include: + +- PBS/Torque +- SLURM +- Amazon (AWS Batch) +- SGE (Sun Grid Engine) + +To enable the SLURM executor on Puhti, simply set `process.executor` property to slurm value in the `nextflow.config` file as shown below: + +``` +profiles { + + + standard { + process.executor = 'local' + } + + puhti { + process.clusterOptions = '--account=project_xxxx --ntasks-per-node=1 --cpus-per-task=4 --ntasks=1 --time=00:00:05' + process.executor = 'slurm' + process.queue = 'small' + process.memory = '10GB' + } + +} +``` + +In this case, you can run a nextflow script as below: + +``` +nextflow run -profile puhti +``` +This will submit each process of your job to Puhti cluster. + + +### Running Snakemake with HyperQueue executor In this example, let's use the [HyperQueue meta-scheduler](../../apps/hyperqueue.md) for executing a Nextflow From d6caa22e239d37fb31ffb4dd131baea07d73c040 Mon Sep 17 00:00:00 2001 From: Laxmana Yetukuri Date: Fri, 20 Dec 2024 09:17:17 +0200 Subject: [PATCH 02/29] small fixes --- docs/support/tutorials/nextflow-puhti.md | 34 ++++++++++++++---------- 1 file changed, 20 insertions(+), 14 deletions(-) diff --git a/docs/support/tutorials/nextflow-puhti.md b/docs/support/tutorials/nextflow-puhti.md index afb0d5bdec..0ab7347697 100755 --- a/docs/support/tutorials/nextflow-puhti.md +++ b/docs/support/tutorials/nextflow-puhti.md @@ -1,21 +1,27 @@ # Running Nextflow pipelines on Puhti -[Nextflow](https://www.nextflow.io/) is one of scientific wokrflow managers written in groovy. Nextflow provides built-in support for +[Nextflow](https://www.nextflow.io/) is one of scientific wokrflow managers written in groovy and provides built-in support for HPC-friendly containers such as Apptainer and Singularity. Although Nextflow -pipelines allow us to choose Docker engine as an executor for running +pipelines allows choosing Docker engine for running pipelines, please note that Docker containers can't be used on Puhti due to the -lack of administrative privileges for regular users. +lack of administrative privileges for users. -Please refer to [High-throughput computing and workflows page](../../computing/running/throughput.md) to get an overview of different tools and help you choose the right tool. +There are many high-throughput tools and workflow managers. Please refer to our [high-throughput computing and workflows page](../../computing/running/throughput.md) to get an overview of different tools and may help you choose the right tool for your needs. ## Installation -The installation of Nextflow is easy as the nextflow is java-based Nextflow is available as a module in Puhti supercomputer. +The installation of Nextflow is easy as it is java-based tool. You can download the latest version of nextflow to your home folder as below: +```bash + +module load java +curl -s https://get.nextflow.io | bash && mv nextflow ~/bin +chmod +x ~/bin/nextflow + +``` ## Nextflow module -Nextflow is available as a module on Puhti. There multiple versions of -nextflow are available. One can choose the version of depending on the requirement of a pipelines.Please note that the nextflow version starting from 23.04.3 can only be +Nextflow is also available as a module on Puhti. One can choose the version of the nextflow depending on the requirement of your own pipeline. Please note that the Nextflow version starting from 23.04.3 can only be used for pipelines built with DSL2. You can downgrade to lower versions for DSL1-compliant pipelines. @@ -33,7 +39,7 @@ module load nextflow/22.10.1 ### Installation of tools used in the the workflow -1. By default, nextflow expects that tools are installed locally. Tools available in other [Puhti modules](../../apps/by_discipline.md) or [own custom module](../../computing/modules.md#using-your-own-module-files). +1. By default, Nextflow expects that tools are installed locally. Tools available in other [Puhti modules](../../apps/by_discipline.md) or [own custom module](../../computing/modules.md#using-your-own-module-files). 2. Own custom installations as Apptainer containers: Containers can be smoothly integrated with Nextflow pipelines. No additional @@ -103,7 +109,7 @@ Lanuch an [interactive session](https://docs.csc.fi/computing/running/interactiv sinteractive -c 2 -m 4G -d 250 -A project_2xxxx # replace actual project number here module load nextflow/23.04.3 # Load nextflow module ``` -‼️ Please note that one has to load a module (in this case nextflow) with a version. Otherwise, the latest version of stable module installed at that point is used. For the reproducibility point of view, make sure to load versions of all tools including the nextflow module. +‼️ Please note that one has to load a module (in this case Nextflow) with a version. Otherwise, the latest version of stable module installed at that point is used. For the reproducibility point of view, make sure to load versions of all tools including the Nextflow module. ## Tutorial 1: Hello-world example @@ -218,9 +224,9 @@ Monitor the status of submitted Slurm job squeue -u $USER ``` -### Running Snakemake workflow with SLURM executor (Currently NOT recommended on Puhti but good to know to realise the power of nextflow) +### Running Nextflow with slurm executor (Currently NOT recommended on Puhti when you have multiple small jobs) -One of the advantages of nextflow is that the actual pipeline functional logic is separated from the execution environment. The same script can therefore be executed in different environment by changing the execution environment without touching actual pipeline code. Nextflow uses `executor` information to decide where the job should be run. Once executor is configured, Nextflow submits each process to the specified job scheduler on your behalf (=you don't need to write sbatch script, nextflow writes on the fly for you, instead). +One of the advantages of Nextflow is that the actual pipeline functional logic is separated from the execution environment. The same script can therefore be executed in different environment by changing the execution environment without touching actual pipeline code. Nextflow uses `executor` information to decide where the job should be run. Once executor is configured, Nextflow submits each process to the specified job scheduler on your behalf (=you don't need to write sbatch script, Nextflow writes on the fly for you, instead). Default executor is `local` where process is run in your computer/localhost where Nextflow is launched. Other executors include: @@ -249,7 +255,7 @@ profiles { } ``` -In this case, you can run a nextflow script as below: +In this case, you can run a Nextflow script as below: ``` nextflow run -profile puhti @@ -257,7 +263,7 @@ nextflow run -profile puhti This will submit each process of your job to Puhti cluster. -### Running Snakemake with HyperQueue executor +### Running Nextflow with HyperQueue executor In this example, let's use the [HyperQueue meta-scheduler](../../apps/hyperqueue.md) for executing a Nextflow @@ -323,7 +329,7 @@ hq server stop ``` !!! note - Please make sure that your nextflow configuration file (`nextflow.config`) + Please make sure that your Nextflow configuration file (`nextflow.config`) has the correct executor name when using the HypeQueue executor. Also, when multiple nodes are used, ensure that the executor knows how many jobs it can submit using the parameter `queueSize` under the `executor` block. From 5c7fca3699c1b77782010d39f870b09b1bc9692e Mon Sep 17 00:00:00 2001 From: Laxmana Yetukuri Date: Fri, 20 Dec 2024 09:48:26 +0200 Subject: [PATCH 03/29] small fixes --- docs/support/tutorials/nextflow-puhti.md | 66 ++++++++++-------------- 1 file changed, 27 insertions(+), 39 deletions(-) diff --git a/docs/support/tutorials/nextflow-puhti.md b/docs/support/tutorials/nextflow-puhti.md index 0ab7347697..d2b6439fc8 100755 --- a/docs/support/tutorials/nextflow-puhti.md +++ b/docs/support/tutorials/nextflow-puhti.md @@ -1,20 +1,20 @@ # Running Nextflow pipelines on Puhti -[Nextflow](https://www.nextflow.io/) is one of scientific wokrflow managers written in groovy and provides built-in support for +[Nextflow](https://www.nextflow.io/) is one of the scientific wokrflow managers and provides built-in support for HPC-friendly containers such as Apptainer and Singularity. Although Nextflow -pipelines allows choosing Docker engine for running +allows choosing Docker engine for running pipelines, please note that Docker containers can't be used on Puhti due to the -lack of administrative privileges for users. +lack of administrative privileges for normal users. -There are many high-throughput tools and workflow managers. Please refer to our [high-throughput computing and workflows page](../../computing/running/throughput.md) to get an overview of different tools and may help you choose the right tool for your needs. +There are many other high-throughput tools and workflow managers exist and please refer to our [high-throughput computing and workflows page](../../computing/running/throughput.md) to get an overview of different tools and may help you choose the right tool for your needs. ## Installation - -The installation of Nextflow is easy as it is java-based tool. You can download the latest version of nextflow to your home folder as below: + +The installation of Nextflow is easy as it is java-based tool. You can download the latest version of Nextflow binary to your /home directory on Puhti as below: ```bash -module load java +module load biojava/21 curl -s https://get.nextflow.io | bash && mv nextflow ~/bin chmod +x ~/bin/nextflow @@ -22,7 +22,7 @@ chmod +x ~/bin/nextflow ## Nextflow module Nextflow is also available as a module on Puhti. One can choose the version of the nextflow depending on the requirement of your own pipeline. Please note that the Nextflow version starting from 23.04.3 can only be -used for pipelines built with DSL2. You can downgrade to lower versions for DSL1-compliant pipelines. +used for pipelines built with DSL2 syntax. You can downgrade to lower versions for DSL1-compliant pipelines. Nextflow can be loaded for example as @@ -37,16 +37,16 @@ module load nextflow/22.10.1 some pipelines require a specific version of Nextflow. -### Installation of tools used in the the workflow +### Installation of tools used in Nextflow -1. By default, Nextflow expects that tools are installed locally. Tools available in other [Puhti modules](../../apps/by_discipline.md) or [own custom module](../../computing/modules.md#using-your-own-module-files). +1. Local installations: By default, Nextflow expects that the analysis tools are available locally. Tools can be activated from existing [Puhti modules](../../apps/by_discipline.md) or [own custom module installations](../../computing/modules.md#using-your-own-module-files). -2. Own custom installations as Apptainer containers: +2. Container instalaltions: Containers can be smoothly integrated with Nextflow pipelines. No additional modifications to Nextflow scripts are needed except enabling the -Singularity/Apptainer engine (instead of Docker) in the Nextflow configuration -file in the HPC environment. Nextflow is able to pull remote container images -stored in Singularity or Docker Hub registry. The remote container images are +Singularity/Apptainer engine in the Nextflow configuration +file in the HPC environment. Nextflow can pull remote container images as Singularity/Apptainer +from container registries on the fly. The remote container images are usually specified in the Nextflow script or configuration file by simply prefixing the image name with `shub://` or `docker://`. It is also possible to specify a different Singularity image for each process definition in the @@ -79,31 +79,19 @@ require downloading large volumes of data such as databases and container images. This can take a while and may not even work successfully for the first time when downloading multiple Apptainer/Singularity images or databases. -You can do the following basic preparation steps before running your Nextflow -pipeline: - -* Copy Apptainer images from your local workstation to your project folder on - Puhti. Pay attention to the Apptainer cache directory (i.e. - `$APPTAINER_CACHEDIR`) which is usually `$HOME/.apptainer/cache`. Note that - `$HOME` directory quota is only 10 GB on Puhti, so it may fill up quickly. -* Move all your raw data to your project directory (`/scratch/`) - on Puhti. -* Clone the GitHub repository of your pipeline to your scratch directory and - then [run your pipeline](#5-run-your-nextflow-pipeline-as-a-batch-job). - ## Usage -Snakemake can be run in 4 different ways in supercomputers: +Nextflow pipelines can be run in different ways in supercomputering environment: -1. [In interactive mode](../../computing/running/interactive-usage.md) with local executor, with limited resources. Useful mainly for debugging or very small workflows. -2. With batch job and local executor. Resource usage limited to one full node. Useful for small and medium size workflows, simpler than next options, start with this, if unsure. -3. With batch job and SLURM executor. Can use multiple nodes and different SLURM partitions (CPU and GPU), but may create significant overhead, if many small jobs. Could be used, if each job step for each file takes at least 30 min. +1. [In interactive mode](../../computing/running/interactive-usage.md) with local executor, with limited resources. Useful mainly for debugging or testing very small workflows. +2. With batch job and local executor. Useful for small and medium size workflows +3. With batch job and SLURM executor. This can use multiple nodes and different SLURM partitions (CPU and GPU), but may create significant overhead, if many small jobs. Could be used, if each job step for each file takes at least 30 min. 4. With batch job and HyperQueue as a sub-job scheduler. Can use multiple nodes in the same batch job allocation, most complex set up. Suits well for cases, when workflow includes a lot of small job steps with many input files (high-troughput computing). !!! info "Note" - Please do not launch heavy Snakemake workflows on **login nodes**. + Please do not launch heavy Nextflow workflows on **login nodes**. -### Running Snakemake workflow with local executor interactively +### Running Nextflow pipeline with local executor interactively Lanuch an [interactive session](https://docs.csc.fi/computing/running/interactive-usage/) on Puhti as below: ``` sinteractive -c 2 -m 4G -d 250 -A project_2xxxx # replace actual project number here @@ -151,7 +139,7 @@ workflow { ```nextflow nextflow run hello-world.nf ``` -This script defines one process named `sayHello`. This process takes a set of greetings from different languages and then writes each one to a separate file. +This script defines one process named `sayHello`. This process takes a set of greetings from different languages and then writes each one to a separate file in a random order. The resulting terminal output would look similar to the text shown below: @@ -162,12 +150,12 @@ executor > local (5) [a0/bdf83f] process > sayHello (5) [100%] 5 of 5 ✔ ``` -### Running Snakemake workflow with local executor and batch job +### Running Nextflow with local executor in a batch job Please follow our [instructions for writing a batch job script for Puhti](../../computing/running/example-job-scripts-puhti.md). -Although Nextflow comes with native Slurm support, one has to avoid launching +Although Nextflow supports SLURM natively, one has to avoid launching large amounts of very short jobs using it. Instead, one can launch a Nextflow job as a regular batch job that co-executes all job tasks in the same job allocation. Below is a minimal example to get started with your Nextflow @@ -224,9 +212,9 @@ Monitor the status of submitted Slurm job squeue -u $USER ``` -### Running Nextflow with slurm executor (Currently NOT recommended on Puhti when you have multiple small jobs) +### Running Nextflow with slurm executor (Currently NOT recommended on Puhti when you have several small jobs) -One of the advantages of Nextflow is that the actual pipeline functional logic is separated from the execution environment. The same script can therefore be executed in different environment by changing the execution environment without touching actual pipeline code. Nextflow uses `executor` information to decide where the job should be run. Once executor is configured, Nextflow submits each process to the specified job scheduler on your behalf (=you don't need to write sbatch script, Nextflow writes on the fly for you, instead). +One of the advantages of Nextflow is that the actual pipeline functional logic is separated from the execution environment. The same script can therefore be executed in different environments by changing the execution environment without touching actual pipeline code. Nextflow uses `executor` information to decide where the job should be run. Once executor is configured, Nextflow submits each process to the specified job scheduler on your behalf (=you don't need to write sbatch script, Nextflow writes on the fly for you, instead). Default executor is `local` where process is run in your computer/localhost where Nextflow is launched. Other executors include: @@ -260,7 +248,7 @@ In this case, you can run a Nextflow script as below: ``` nextflow run -profile puhti ``` -This will submit each process of your job to Puhti cluster. +This will submit each process of your job as a batch job to Puhti cluster. ### Running Nextflow with HyperQueue executor @@ -268,7 +256,7 @@ This will submit each process of your job to Puhti cluster. In this example, let's use the [HyperQueue meta-scheduler](../../apps/hyperqueue.md) for executing a Nextflow pipeline. This executor can be used to scale up analysis across multiple nodes -when needed. +when needed. However, the executor settings can be complex depending on the pipeline. Here is a batch script for running a [nf-core pipeline](https://nf-co.re/pipelines) on Puhti: From 8b7c4011da08300d068deb15e0c8fb0097d9bccc Mon Sep 17 00:00:00 2001 From: Laxmana Yetukuri Date: Fri, 20 Dec 2024 09:55:18 +0200 Subject: [PATCH 04/29] small fixes --- docs/support/tutorials/nextflow-puhti.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/support/tutorials/nextflow-puhti.md b/docs/support/tutorials/nextflow-puhti.md index d2b6439fc8..cd318be2e3 100755 --- a/docs/support/tutorials/nextflow-puhti.md +++ b/docs/support/tutorials/nextflow-puhti.md @@ -6,7 +6,7 @@ allows choosing Docker engine for running pipelines, please note that Docker containers can't be used on Puhti due to the lack of administrative privileges for normal users. -There are many other high-throughput tools and workflow managers exist and please refer to our [high-throughput computing and workflows page](../../computing/running/throughput.md) to get an overview of different tools and may help you choose the right tool for your needs. +There are many other high-throughput tools and workflow managers exist for scientific computing and selecting the right tool can sometimes be challenging. Please refer to our [high-throughput computing and workflows page](../../computing/running/throughput.md) to get an overview from the selected list of relevant tools. ## Installation From 2f0156c311abc9fad1939e3a79e51025ab2eff71 Mon Sep 17 00:00:00 2001 From: Laxmana Yetukuri Date: Fri, 20 Dec 2024 10:09:04 +0200 Subject: [PATCH 05/29] small fixes --- docs/support/tutorials/nextflow-puhti.md | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/docs/support/tutorials/nextflow-puhti.md b/docs/support/tutorials/nextflow-puhti.md index cd318be2e3..74ceb48c8a 100755 --- a/docs/support/tutorials/nextflow-puhti.md +++ b/docs/support/tutorials/nextflow-puhti.md @@ -25,11 +25,11 @@ Nextflow is also available as a module on Puhti. One can choose the version of t used for pipelines built with DSL2 syntax. You can downgrade to lower versions for DSL1-compliant pipelines. -Nextflow can be loaded for example as +Nextflow can be loaded as below: ```bash -module load nextflow/22.10.1 +module load nextflow/ # e.g., module load nextflow/22.10.1 ``` !!! note @@ -53,7 +53,7 @@ specify a different Singularity image for each process definition in the Nextflow pipeline script. Most Nextflow pipelines pull the needed container images on the fly. However, -when there are multiple images involved, it is a good idea to prepare the +when multiple images are needed in a pipeline, it is a good idea to prepare the images locally first before launching your Nextflow pipeline. Here are some options for preparing your Apptainer/Singularity image: @@ -97,7 +97,7 @@ Lanuch an [interactive session](https://docs.csc.fi/computing/running/interactiv sinteractive -c 2 -m 4G -d 250 -A project_2xxxx # replace actual project number here module load nextflow/23.04.3 # Load nextflow module ``` -‼️ Please note that one has to load a module (in this case Nextflow) with a version. Otherwise, the latest version of stable module installed at that point is used. For the reproducibility point of view, make sure to load versions of all tools including the Nextflow module. +‼️ Please note that one has to load Nextflow module with a version. Otherwise, the latest version of stable module installed at that point is used. For the reproducibility point of view, make sure to load versions of all tools including the Nextflow module. ## Tutorial 1: Hello-world example @@ -137,6 +137,7 @@ workflow { Execute the script by entering the following command on your interactive Puhti terminal: ```nextflow +module load nextflow/23.04.3 nextflow run hello-world.nf ``` This script defines one process named `sayHello`. This process takes a set of greetings from different languages and then writes each one to a separate file in a random order. @@ -196,10 +197,10 @@ nextflow run workflow.nf `#SBATCH --gres=nvme:100` to request 100 GB of space on `$LOCAL_SCRATCH`. -Finally, submit your job as below: +Finally, copy above script to a fle (e.g., nextflow_script.sh) and submit the job to cluster as below: ``` -sbatch scrna_nfcore.sh +sbatch nextflow_script.sh ``` Monitor the status of submitted Slurm job @@ -212,9 +213,9 @@ Monitor the status of submitted Slurm job squeue -u $USER ``` -### Running Nextflow with slurm executor (Currently NOT recommended on Puhti when you have several small jobs) +### Running Nextflow with SLURM executor (Currently NOT recommended on Puhti when you have several small jobs) -One of the advantages of Nextflow is that the actual pipeline functional logic is separated from the execution environment. The same script can therefore be executed in different environments by changing the execution environment without touching actual pipeline code. Nextflow uses `executor` information to decide where the job should be run. Once executor is configured, Nextflow submits each process to the specified job scheduler on your behalf (=you don't need to write sbatch script, Nextflow writes on the fly for you, instead). +One of the advantages of Nextflow is that the actual pipeline functional logic is separated from the execution environment. The same script can therefore be executed in different environments by changing the execution environment without touching actual pipeline code. Nextflow uses `executor` information to decide where the job should be run. Once executor is configured, Nextflow submits each process to the specified job scheduler on your behalf (=you don't need to write sbatch script, Nextflow writes on the fly for you, instead ). Default executor is `local` where process is run in your computer/localhost where Nextflow is launched. Other executors include: @@ -248,7 +249,7 @@ In this case, you can run a Nextflow script as below: ``` nextflow run -profile puhti ``` -This will submit each process of your job as a batch job to Puhti cluster. +This will submit each process of your job as a separate batch job to Puhti cluster. ### Running Nextflow with HyperQueue executor From cf708855d6cf884be78f3143128b8960139b42a4 Mon Sep 17 00:00:00 2001 From: Laxmana Yetukuri Date: Fri, 20 Dec 2024 10:12:44 +0200 Subject: [PATCH 06/29] small fixes --- docs/support/tutorials/nextflow-puhti.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/support/tutorials/nextflow-puhti.md b/docs/support/tutorials/nextflow-puhti.md index 74ceb48c8a..15a7ab50e9 100755 --- a/docs/support/tutorials/nextflow-puhti.md +++ b/docs/support/tutorials/nextflow-puhti.md @@ -89,7 +89,7 @@ Nextflow pipelines can be run in different ways in supercomputering environment: 4. With batch job and HyperQueue as a sub-job scheduler. Can use multiple nodes in the same batch job allocation, most complex set up. Suits well for cases, when workflow includes a lot of small job steps with many input files (high-troughput computing). !!! info "Note" - Please do not launch heavy Nextflow workflows on **login nodes**. + Please do not launch heavy Nextflow workflows on login nodes. ### Running Nextflow pipeline with local executor interactively Lanuch an [interactive session](https://docs.csc.fi/computing/running/interactive-usage/) on Puhti as below: From 17fad7d9c19955fc8b1c47800f968905f37612e1 Mon Sep 17 00:00:00 2001 From: Laxmana Yetukuri Date: Fri, 20 Dec 2024 10:19:08 +0200 Subject: [PATCH 07/29] small fixes --- docs/support/tutorials/nextflow-puhti.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/docs/support/tutorials/nextflow-puhti.md b/docs/support/tutorials/nextflow-puhti.md index 15a7ab50e9..78e4b082e1 100755 --- a/docs/support/tutorials/nextflow-puhti.md +++ b/docs/support/tutorials/nextflow-puhti.md @@ -97,7 +97,8 @@ Lanuch an [interactive session](https://docs.csc.fi/computing/running/interactiv sinteractive -c 2 -m 4G -d 250 -A project_2xxxx # replace actual project number here module load nextflow/23.04.3 # Load nextflow module ``` -‼️ Please note that one has to load Nextflow module with a version. Otherwise, the latest version of stable module installed at that point is used. For the reproducibility point of view, make sure to load versions of all tools including the Nextflow module. + +Please note that one has to load Nextflow module with a version. Otherwise, the latest version of stable module installed at that point is used. For the reproducibility point of view, make sure to load versions of all tools including the Nextflow module. ## Tutorial 1: Hello-world example @@ -134,7 +135,7 @@ workflow { ``` - Execute the script by entering the following command on your interactive Puhti terminal: + Execute the script by entering the following command on Puhti interactive terminal: ```nextflow module load nextflow/23.04.3 @@ -156,8 +157,8 @@ executor > local (5) Please follow our [instructions for writing a batch job script for Puhti](../../computing/running/example-job-scripts-puhti.md). -Although Nextflow supports SLURM natively, one has to avoid launching -large amounts of very short jobs using it. Instead, one can launch a Nextflow +Although Nextflow supports SLURM natively, avoid launching +large amounts of very short jobs using SLURM. Instead, one can launch a Nextflow job as a regular batch job that co-executes all job tasks in the same job allocation. Below is a minimal example to get started with your Nextflow pipeline on Puhti: From 2823a54cea453fd88dce7d7cf4ccb0a4182016de Mon Sep 17 00:00:00 2001 From: Laxmana Yetukuri Date: Fri, 20 Dec 2024 10:21:46 +0200 Subject: [PATCH 08/29] small fixes --- docs/support/tutorials/nextflow-puhti.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/support/tutorials/nextflow-puhti.md b/docs/support/tutorials/nextflow-puhti.md index 78e4b082e1..9e388e0b40 100755 --- a/docs/support/tutorials/nextflow-puhti.md +++ b/docs/support/tutorials/nextflow-puhti.md @@ -10,7 +10,8 @@ There are many other high-throughput tools and workflow managers exist for scien ## Installation -The installation of Nextflow is easy as it is java-based tool. You can download the latest version of Nextflow binary to your /home directory on Puhti as below: +### Custom installations +The installation of Nextflow is easy as it is java-based tool. You can for example download the latest version of Nextflow binary to your /home directory on Puhti as below: ```bash @@ -20,7 +21,7 @@ chmod +x ~/bin/nextflow ``` -## Nextflow module +### Nextflow module Nextflow is also available as a module on Puhti. One can choose the version of the nextflow depending on the requirement of your own pipeline. Please note that the Nextflow version starting from 23.04.3 can only be used for pipelines built with DSL2 syntax. You can downgrade to lower versions for DSL1-compliant pipelines. @@ -100,7 +101,7 @@ module load nextflow/23.04.3 # Load nextflow module Please note that one has to load Nextflow module with a version. Otherwise, the latest version of stable module installed at that point is used. For the reproducibility point of view, make sure to load versions of all tools including the Nextflow module. -## Tutorial 1: Hello-world example +### Tutorial 1: Hello-world example This "Hello-world" minimalist example demonstrates the basic syntax of Nextflow for a process. In this tutorial, you will learn how to run a Nextflow script as well as understand the default location of resulting output files. Copy the below script to a file named, hello-world.nf From e82d070a173d29b13a7c3f54818e933c9fb6a8d3 Mon Sep 17 00:00:00 2001 From: Laxmana Yetukuri Date: Fri, 20 Dec 2024 10:23:14 +0200 Subject: [PATCH 09/29] small fixes --- docs/support/tutorials/nextflow-puhti.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/support/tutorials/nextflow-puhti.md b/docs/support/tutorials/nextflow-puhti.md index 9e388e0b40..1377c54587 100755 --- a/docs/support/tutorials/nextflow-puhti.md +++ b/docs/support/tutorials/nextflow-puhti.md @@ -101,7 +101,7 @@ module load nextflow/23.04.3 # Load nextflow module Please note that one has to load Nextflow module with a version. Otherwise, the latest version of stable module installed at that point is used. For the reproducibility point of view, make sure to load versions of all tools including the Nextflow module. -### Tutorial 1: Hello-world example +### Running hello-world example This "Hello-world" minimalist example demonstrates the basic syntax of Nextflow for a process. In this tutorial, you will learn how to run a Nextflow script as well as understand the default location of resulting output files. Copy the below script to a file named, hello-world.nf From 030e9c7532cdd4b5a7309e4051d538092bf32c5c Mon Sep 17 00:00:00 2001 From: Laxmana Yetukuri Date: Fri, 20 Dec 2024 10:39:06 +0200 Subject: [PATCH 10/29] small fixes --- docs/support/tutorials/nextflow-puhti.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/support/tutorials/nextflow-puhti.md b/docs/support/tutorials/nextflow-puhti.md index 1377c54587..c61e4e9ec8 100755 --- a/docs/support/tutorials/nextflow-puhti.md +++ b/docs/support/tutorials/nextflow-puhti.md @@ -33,7 +33,7 @@ below: module load nextflow/ # e.g., module load nextflow/22.10.1 ``` -!!! note +!!! info "Note" Please make sure to specify the correct version of the Nextflow module as some pipelines require a specific version of Nextflow. @@ -68,7 +68,7 @@ Here are some options for preparing your Apptainer/Singularity image: More information on these different options can be found in our [documentation on creating containers](../../computing/containers/creating.md). -!!! note +!!! info "Note" Singularity/Apptainer is installed on login and compute nodes and does not require loading a separate module on either Puhti, Mahti or LUMI. * Apptainer container can be downloaded from some repository or built locally. For building custom Apptainer containers, see [Creating containers page](../../computing/containers/creating.md). @@ -180,7 +180,7 @@ nextflow run workflow.nf # nf-core pipeline example: nextflow run nf-core/scrnaseq -profile test,singularity -resume --outdir . ``` -!!! note +!!! info "Note" If you are directly pulling multiple images on the fly, please set `$APPTAINER_TMPDIR` and `$APPTAINER_CACHEDIR` to either local scratch (i.e. `$LOCAL_SCRATCH`) or to your scratch folder (`/scratch/`) @@ -319,7 +319,7 @@ hq worker stop all hq server stop ``` -!!! note +!!! info "Note" Please make sure that your Nextflow configuration file (`nextflow.config`) has the correct executor name when using the HypeQueue executor. Also, when multiple nodes are used, ensure that the executor knows how many jobs From 5d691e3b492f461e0141d6772ca48a5c314c4fef Mon Sep 17 00:00:00 2001 From: Laxmana Yetukuri Date: Fri, 20 Dec 2024 10:52:26 +0200 Subject: [PATCH 11/29] small fixes --- docs/support/tutorials/nextflow-puhti.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/support/tutorials/nextflow-puhti.md b/docs/support/tutorials/nextflow-puhti.md index c61e4e9ec8..af606d3f55 100755 --- a/docs/support/tutorials/nextflow-puhti.md +++ b/docs/support/tutorials/nextflow-puhti.md @@ -101,9 +101,8 @@ module load nextflow/23.04.3 # Load nextflow module Please note that one has to load Nextflow module with a version. Otherwise, the latest version of stable module installed at that point is used. For the reproducibility point of view, make sure to load versions of all tools including the Nextflow module. -### Running hello-world example -This "Hello-world" minimalist example demonstrates the basic syntax of Nextflow for a process. In this tutorial, you will learn how to run a Nextflow script as well as understand the default location of resulting output files. Copy the below script to a file named, hello-world.nf +The following "Hello-world" minimalist example demonstrates the basic syntax of Nextflow. Copy the below script to a file named, hello-world.nf. ```nextflow #!/usr/bin/env nextflow From 3637d1aa6bd0e1f05e7ccd3df22a2d00eab48cc1 Mon Sep 17 00:00:00 2001 From: Laxmana Yetukuri Date: Fri, 20 Dec 2024 13:01:56 +0200 Subject: [PATCH 12/29] nextflow docs update --- docs/support/tutorials/nextflow-puhti.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/support/tutorials/nextflow-puhti.md b/docs/support/tutorials/nextflow-puhti.md index af606d3f55..2f3c410bbc 100755 --- a/docs/support/tutorials/nextflow-puhti.md +++ b/docs/support/tutorials/nextflow-puhti.md @@ -10,7 +10,7 @@ There are many other high-throughput tools and workflow managers exist for scien ## Installation -### Custom installations +### Custom Nextflow installations The installation of Nextflow is easy as it is java-based tool. You can for example download the latest version of Nextflow binary to your /home directory on Puhti as below: ```bash @@ -21,7 +21,7 @@ chmod +x ~/bin/nextflow ``` -### Nextflow module +### Nextflow as modules Nextflow is also available as a module on Puhti. One can choose the version of the nextflow depending on the requirement of your own pipeline. Please note that the Nextflow version starting from 23.04.3 can only be used for pipelines built with DSL2 syntax. You can downgrade to lower versions for DSL1-compliant pipelines. From d55202089ca23cd9b3f2c3bcfa00ea271016833a Mon Sep 17 00:00:00 2001 From: Laxmana Yetukuri Date: Fri, 20 Dec 2024 13:32:57 +0200 Subject: [PATCH 13/29] nextflow docs update --- docs/support/tutorials/nextflow-puhti.md | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/docs/support/tutorials/nextflow-puhti.md b/docs/support/tutorials/nextflow-puhti.md index 2f3c410bbc..977f0048cd 100755 --- a/docs/support/tutorials/nextflow-puhti.md +++ b/docs/support/tutorials/nextflow-puhti.md @@ -10,16 +10,6 @@ There are many other high-throughput tools and workflow managers exist for scien ## Installation -### Custom Nextflow installations -The installation of Nextflow is easy as it is java-based tool. You can for example download the latest version of Nextflow binary to your /home directory on Puhti as below: - -```bash - -module load biojava/21 -curl -s https://get.nextflow.io | bash && mv nextflow ~/bin -chmod +x ~/bin/nextflow - -``` ### Nextflow as modules Nextflow is also available as a module on Puhti. One can choose the version of the nextflow depending on the requirement of your own pipeline. Please note that the Nextflow version starting from 23.04.3 can only be @@ -38,6 +28,17 @@ module load nextflow/ # e.g., module load nextflow/22.10.1 some pipelines require a specific version of Nextflow. +### Custom Nextflow installations +The installation of Nextflow is easy as it is java-based tool. You can for example download the latest version of Nextflow binary to your /home directory on Puhti as below: + +```bash + +module load biojava/21 +curl -s https://get.nextflow.io | bash && mv nextflow ~/bin +chmod +x ~/bin/nextflow + +``` + ### Installation of tools used in Nextflow 1. Local installations: By default, Nextflow expects that the analysis tools are available locally. Tools can be activated from existing [Puhti modules](../../apps/by_discipline.md) or [own custom module installations](../../computing/modules.md#using-your-own-module-files). @@ -214,7 +215,7 @@ Monitor the status of submitted Slurm job squeue -u $USER ``` -### Running Nextflow with SLURM executor (Currently NOT recommended on Puhti when you have several small jobs) +### Running Nextflow with SLURM executor One of the advantages of Nextflow is that the actual pipeline functional logic is separated from the execution environment. The same script can therefore be executed in different environments by changing the execution environment without touching actual pipeline code. Nextflow uses `executor` information to decide where the job should be run. Once executor is configured, Nextflow submits each process to the specified job scheduler on your behalf (=you don't need to write sbatch script, Nextflow writes on the fly for you, instead ). From cf0877c23cfba65ea48668aaad92ff067f9ff0d5 Mon Sep 17 00:00:00 2001 From: Laxmana Yetukuri Date: Fri, 20 Dec 2024 13:35:27 +0200 Subject: [PATCH 14/29] nextflow docs update --- docs/support/tutorials/nextflow-puhti.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/docs/support/tutorials/nextflow-puhti.md b/docs/support/tutorials/nextflow-puhti.md index 977f0048cd..13da193b7f 100755 --- a/docs/support/tutorials/nextflow-puhti.md +++ b/docs/support/tutorials/nextflow-puhti.md @@ -11,8 +11,9 @@ There are many other high-throughput tools and workflow managers exist for scien ## Installation -### Nextflow as modules -Nextflow is also available as a module on Puhti. One can choose the version of the nextflow depending on the requirement of your own pipeline. Please note that the Nextflow version starting from 23.04.3 can only be +### Nextflow as a module + +Nextflow is available as a module on Puhti. One can choose the version of the nextflow depending on the requirement of your own pipeline. Please note that the Nextflow version starting from 23.04.3 can only be used for pipelines built with DSL2 syntax. You can downgrade to lower versions for DSL1-compliant pipelines. @@ -29,6 +30,7 @@ module load nextflow/ # e.g., module load nextflow/22.10.1 ### Custom Nextflow installations + The installation of Nextflow is easy as it is java-based tool. You can for example download the latest version of Nextflow binary to your /home directory on Puhti as below: ```bash @@ -83,6 +85,7 @@ time when downloading multiple Apptainer/Singularity images or databases. ## Usage + Nextflow pipelines can be run in different ways in supercomputering environment: 1. [In interactive mode](../../computing/running/interactive-usage.md) with local executor, with limited resources. Useful mainly for debugging or testing very small workflows. From 567877eb4b32b6af72a7753ef08a0bad4a90b97e Mon Sep 17 00:00:00 2001 From: Kylli Ek Date: Fri, 20 Dec 2024 14:32:13 +0200 Subject: [PATCH 15/29] wording, add Mahti --- docs/support/tutorials/nextflow-puhti.md | 46 ++++++++++++------------ 1 file changed, 23 insertions(+), 23 deletions(-) diff --git a/docs/support/tutorials/nextflow-puhti.md b/docs/support/tutorials/nextflow-puhti.md index 13da193b7f..0af2eb94b7 100755 --- a/docs/support/tutorials/nextflow-puhti.md +++ b/docs/support/tutorials/nextflow-puhti.md @@ -1,24 +1,21 @@ -# Running Nextflow pipelines on Puhti +# Running Nextflow pipelines on Puhti and Mahti [Nextflow](https://www.nextflow.io/) is one of the scientific wokrflow managers and provides built-in support for -HPC-friendly containers such as Apptainer and Singularity. Although Nextflow +HPC-friendly containers such as Apptainer (Singularity). Although Nextflow allows choosing Docker engine for running -pipelines, please note that Docker containers can't be used on Puhti due to the +pipelines, please note that Docker containers can't be used on supercomputers due to the lack of administrative privileges for normal users. There are many other high-throughput tools and workflow managers exist for scientific computing and selecting the right tool can sometimes be challenging. Please refer to our [high-throughput computing and workflows page](../../computing/running/throughput.md) to get an overview from the selected list of relevant tools. ## Installation - ### Nextflow as a module -Nextflow is available as a module on Puhti. One can choose the version of the nextflow depending on the requirement of your own pipeline. Please note that the Nextflow version starting from 23.04.3 can only be -used for pipelines built with DSL2 syntax. You can downgrade to lower versions for DSL1-compliant pipelines. - +Nextflow is available as a module on Puhti and Mahti. One can choose the version of the nextflow depending on the requirement of your own pipeline. Please note that the Nextflow version starting from 23.04.3 can only be +used for pipelines built with DSL2 syntax. You can select a lower version for DSL1-compliant pipelines. -Nextflow can be loaded as -below: +Nextflow can be loaded as below: ```bash module load nextflow/ # e.g., module load nextflow/22.10.1 @@ -31,19 +28,21 @@ module load nextflow/ # e.g., module load nextflow/22.10.1 ### Custom Nextflow installations -The installation of Nextflow is easy as it is java-based tool. You can for example download the latest version of Nextflow binary to your /home directory on Puhti as below: +If you need some very specific verion or for some other reason, you can install Nextflow also yourself. It is easy as it is java-based tool. You can for example download the latest version of Nextflow binary to your /home directory as below: ```bash - -module load biojava/21 -curl -s https://get.nextflow.io | bash && mv nextflow ~/bin +cd ~/bin +curl -s https://get.nextflow.io chmod +x ~/bin/nextflow +``` +``` +module load biojava/21 ``` ### Installation of tools used in Nextflow -1. Local installations: By default, Nextflow expects that the analysis tools are available locally. Tools can be activated from existing [Puhti modules](../../apps/by_discipline.md) or [own custom module installations](../../computing/modules.md#using-your-own-module-files). +1. Local installations: By default, Nextflow expects that the analysis tools are available locally. Tools can be activated from existing [modules](../../apps/by_discipline.md) or [own custom module installations](../../computing/modules.md#using-your-own-module-files). 2. Container instalaltions: Containers can be smoothly integrated with Nextflow pipelines. No additional @@ -62,11 +61,11 @@ images locally first before launching your Nextflow pipeline. Here are some options for preparing your Apptainer/Singularity image: -* Build a Singularity/Apptainer image on Puhti without `sudo` access using +* Build a Singularity/Apptainer image without `sudo` access using `--fakeroot` flag. * Convert a Docker image to Apptainer on your local system and then copy it - to Puhti. -* Convert a Docker image from a container registry on Puhti. + to the supercomputer. +* Convert a Docker image from a container registry on the supercomputer. More information on these different options can be found in our [documentation on creating containers](../../computing/containers/creating.md). @@ -97,7 +96,7 @@ Nextflow pipelines can be run in different ways in supercomputering environment: Please do not launch heavy Nextflow workflows on login nodes. ### Running Nextflow pipeline with local executor interactively -Lanuch an [interactive session](https://docs.csc.fi/computing/running/interactive-usage/) on Puhti as below: +Launch an [interactive session](https://docs.csc.fi/computing/running/interactive-usage/) as below: ``` sinteractive -c 2 -m 4G -d 250 -A project_2xxxx # replace actual project number here module load nextflow/23.04.3 # Load nextflow module @@ -139,7 +138,7 @@ workflow { ``` - Execute the script by entering the following command on Puhti interactive terminal: + Execute the script by entering the following command in the interactive terminal: ```nextflow module load nextflow/23.04.3 @@ -165,7 +164,7 @@ Although Nextflow supports SLURM natively, avoid launching large amounts of very short jobs using SLURM. Instead, one can launch a Nextflow job as a regular batch job that co-executes all job tasks in the same job allocation. Below is a minimal example to get started with your Nextflow -pipeline on Puhti: +pipeline: ```bash #!/bin/bash @@ -229,7 +228,7 @@ Default executor is `local` where process is run in your computer/localhost wher - Amazon (AWS Batch) - SGE (Sun Grid Engine) -To enable the SLURM executor on Puhti, simply set `process.executor` property to slurm value in the `nextflow.config` file as shown below: +To enable the SLURM executor, simply set `process.executor` property to slurm value in the `nextflow.config` file as shown below: ``` profiles { @@ -254,7 +253,7 @@ In this case, you can run a Nextflow script as below: ``` nextflow run -profile puhti ``` -This will submit each process of your job as a separate batch job to Puhti cluster. +This will submit each process of your job as a separate batch job to Puhti supercomputer. ### Running Nextflow with HyperQueue executor @@ -265,7 +264,7 @@ pipeline. This executor can be used to scale up analysis across multiple nodes when needed. However, the executor settings can be complex depending on the pipeline. Here is a batch script for running a -[nf-core pipeline](https://nf-co.re/pipelines) on Puhti: +[nf-core pipeline](https://nf-co.re/pipelines): ```bash #!/bin/bash @@ -343,3 +342,4 @@ hq server stop * [Official Nextflow documentation](https://www.nextflow.io/docs/latest/index.html) * [CSC's Nextflow documentation](../../apps/nextflow.md) * [High-throughput Nextflow workflow using HyperQueue](nextflow-hq.md) +* [Master thesis by Antoni Gołoś comparing automated workflow approaches on supercomputers](https://urn.fi/URN:NBN:fi:aalto-202406164397) From 2b2711e23062ad83a2ae3559c8aebc7801f50594 Mon Sep 17 00:00:00 2001 From: Kylli Ek Date: Fri, 20 Dec 2024 17:04:19 +0200 Subject: [PATCH 16/29] some re-wording --- docs/support/tutorials/nextflow-puhti.md | 233 ++++++++++------------- 1 file changed, 101 insertions(+), 132 deletions(-) diff --git a/docs/support/tutorials/nextflow-puhti.md b/docs/support/tutorials/nextflow-puhti.md index 0af2eb94b7..e780b04cc4 100755 --- a/docs/support/tutorials/nextflow-puhti.md +++ b/docs/support/tutorials/nextflow-puhti.md @@ -1,18 +1,22 @@ # Running Nextflow pipelines on Puhti and Mahti [Nextflow](https://www.nextflow.io/) is one of the scientific wokrflow managers and provides built-in support for -HPC-friendly containers such as Apptainer (Singularity). Although Nextflow -allows choosing Docker engine for running -pipelines, please note that Docker containers can't be used on supercomputers due to the -lack of administrative privileges for normal users. +HPC-friendly containers such as Apptainer (= Singularity). One of the advantages of Nextflow is that the actual pipeline functional logic is separated from the execution environment. The same script can therefore be executed in different environments by changing the execution environment without touching actual pipeline code. Nextflow uses `executor` information to decide where the job should be run. Once executor is configured, Nextflow submits each process to the specified job scheduler on your behalf (=you don't need to write sbatch script, Nextflow writes on the fly for you, instead ). + +Default executor is `local` where process is run in your computer/localhost where Nextflow is launched. Other executors include: + +- SLURM +- PBS/Torque +- Amazon (AWS Batch) +- SGE (Sun Grid Engine) There are many other high-throughput tools and workflow managers exist for scientific computing and selecting the right tool can sometimes be challenging. Please refer to our [high-throughput computing and workflows page](../../computing/running/throughput.md) to get an overview from the selected list of relevant tools. ## Installation -### Nextflow as a module +### Nextflow -Nextflow is available as a module on Puhti and Mahti. One can choose the version of the nextflow depending on the requirement of your own pipeline. Please note that the Nextflow version starting from 23.04.3 can only be +Nextflow itself is available as a module on Puhti and Mahti. One can choose the version of the nextflow depending on the requirement of your own pipeline. Please note that the Nextflow version starting from 23.04.3 can only be used for pipelines built with DSL2 syntax. You can select a lower version for DSL1-compliant pipelines. Nextflow can be loaded as below: @@ -21,25 +25,13 @@ Nextflow can be loaded as below: module load nextflow/ # e.g., module load nextflow/22.10.1 ``` +Please note that one has to load Nextflow module with a version. Otherwise, the latest version of stable module installed at that point is used. For the reproducibility point of view, make sure to load versions of all tools including the Nextflow module. + !!! info "Note" Please make sure to specify the correct version of the Nextflow module as some pipelines require a specific version of Nextflow. -### Custom Nextflow installations - -If you need some very specific verion or for some other reason, you can install Nextflow also yourself. It is easy as it is java-based tool. You can for example download the latest version of Nextflow binary to your /home directory as below: - -```bash -cd ~/bin -curl -s https://get.nextflow.io -chmod +x ~/bin/nextflow - -``` -``` -module load biojava/21 -``` - ### Installation of tools used in Nextflow 1. Local installations: By default, Nextflow expects that the analysis tools are available locally. Tools can be activated from existing [modules](../../apps/by_discipline.md) or [own custom module installations](../../computing/modules.md#using-your-own-module-files). @@ -47,40 +39,53 @@ module load biojava/21 2. Container instalaltions: Containers can be smoothly integrated with Nextflow pipelines. No additional modifications to Nextflow scripts are needed except enabling the -Singularity/Apptainer engine in the Nextflow configuration -file in the HPC environment. Nextflow can pull remote container images as Singularity/Apptainer +Apptainer engine in the Nextflow configuration +file in the HPC environment. Nextflow can pull remote container images as Apptainer from container registries on the fly. The remote container images are usually specified in the Nextflow script or configuration file by simply prefixing the image name with `shub://` or `docker://`. It is also possible to -specify a different Singularity image for each process definition in the +specify a different Apptainer image for each process definition in the Nextflow pipeline script. +Although Nextflow +allows choosing Docker engine for running +pipelines, please note that Docker containers can't be used on supercomputers due to the +lack of administrative privileges for normal users. + Most Nextflow pipelines pull the needed container images on the fly. However, when multiple images are needed in a pipeline, it is a good idea to prepare the -images locally first before launching your Nextflow pipeline. - -Here are some options for preparing your Apptainer/Singularity image: +images locally first before launching your Nextflow pipeline. More information about [creating containers](../../computing/containers/creating.md). -* Build a Singularity/Apptainer image without `sudo` access using - `--fakeroot` flag. -* Convert a Docker image to Apptainer on your local system and then copy it - to the supercomputer. -* Convert a Docker image from a container registry on the supercomputer. - -More information on these different options can be found in our -[documentation on creating containers](../../computing/containers/creating.md). !!! info "Note" - Singularity/Apptainer is installed on login and compute nodes and does + Apptainer is installed on login and compute nodes and does not require loading a separate module on either Puhti, Mahti or LUMI. - * Apptainer container can be downloaded from some repository or built locally. For building custom Apptainer containers, see [Creating containers page](../../computing/containers/creating.md). * For binding folders or using other Apptainer flags, use [--apptainer-args option] or delcare in the nextflow.config files. +!!! info "Note" + If you are directly pulling multiple images on the fly, please set + `$APPTAINER_TMPDIR` and `$APPTAINER_CACHEDIR` to either local scratch + (i.e. `$LOCAL_SCRATCH`) or to your scratch folder (`/scratch/`) + in the batch script. Otherwise `$HOME` directory, the size of which is + only 10 GB, will be used. To avoid any disk quota errors while pulling + images, set `$APPTAINER_TMPDIR` and `$APPTAINER_CACHEDIR` in your batch + script as below: + + ```bash + export APPTAINER_TMPDIR=$LOCAL_SCRATCH + export APPTAINER_CACHEDIR=$LOCAL_SCRATCH + ``` + + Note that this also requires requesting NVMe disk in the batch script by + adding `#SBATCH --gres=nvme:`. For example, add + `#SBATCH --gres=nvme:100` to request 100 GB of space on `$LOCAL_SCRATCH`. + Running Nextflow pipelines can sometimes be quite compute-intensive and may require downloading large volumes of data such as databases and container images. This can take a while and may not even work successfully for the first -time when downloading multiple Apptainer/Singularity images or databases. +time when downloading multiple Apptainer images or databases. + ## Usage @@ -89,25 +94,20 @@ Nextflow pipelines can be run in different ways in supercomputering environment: 1. [In interactive mode](../../computing/running/interactive-usage.md) with local executor, with limited resources. Useful mainly for debugging or testing very small workflows. 2. With batch job and local executor. Useful for small and medium size workflows -3. With batch job and SLURM executor. This can use multiple nodes and different SLURM partitions (CPU and GPU), but may create significant overhead, if many small jobs. Could be used, if each job step for each file takes at least 30 min. +3. With batch job and SLURM executor. This can use multiple nodes and different SLURM partitions (CPU and GPU), but may create significant overhead, with many small jobs. Could be used, if each job step for each file takes at least 30 min. 4. With batch job and HyperQueue as a sub-job scheduler. Can use multiple nodes in the same batch job allocation, most complex set up. Suits well for cases, when workflow includes a lot of small job steps with many input files (high-troughput computing). !!! info "Note" Please do not launch heavy Nextflow workflows on login nodes. -### Running Nextflow pipeline with local executor interactively -Launch an [interactive session](https://docs.csc.fi/computing/running/interactive-usage/) as below: -``` -sinteractive -c 2 -m 4G -d 250 -A project_2xxxx # replace actual project number here -module load nextflow/23.04.3 # Load nextflow module -``` - -Please note that one has to load Nextflow module with a version. Otherwise, the latest version of stable module installed at that point is used. For the reproducibility point of view, make sure to load versions of all tools including the Nextflow module. +Please follow our +[instructions for writing a batch job script for Puhti](../../computing/running/example-job-scripts-puhti.md). +### Nextflow script -The following "Hello-world" minimalist example demonstrates the basic syntax of Nextflow. Copy the below script to a file named, hello-world.nf. +The following minimalist example demonstrates the basic syntax of Nextflow. -```nextflow +```nextflow title="workflow.nf" #!/usr/bin/env nextflow greets = Channel.fromList(["Moi", "Ciao", "Hello", "Hola","Bonjour"]) @@ -136,37 +136,32 @@ workflow { sayHello(greets) } -``` - - Execute the script by entering the following command in the interactive terminal: - -```nextflow -module load nextflow/23.04.3 -nextflow run hello-world.nf ``` This script defines one process named `sayHello`. This process takes a set of greetings from different languages and then writes each one to a separate file in a random order. The resulting terminal output would look similar to the text shown below: -```nextflow +```bash N E X T F L O W ~ version 23.04.3 Launching `hello-world.nf` [intergalactic_panini] DSL2 - revision: 880a4a2dfd executor > local (5) [a0/bdf83f] process > sayHello (5) [100%] 5 of 5 ✔ ``` -### Running Nextflow with local executor in a batch job +### Running Nextflow pipeline with local executor interactively +To run Nextflow in [interactive session](https://docs.csc.fi/computing/running/interactive-usage/): +``` +sinteractive -c 2 -m 4G -d 250 -A project_2xxxx # replace actual project number here +module load nextflow/23.04.3 # Load nextflow module +nextflow run workflow.nf +``` -Please follow our -[instructions for writing a batch job script for Puhti](../../computing/running/example-job-scripts-puhti.md). +### Running Nextflow with local executor in a batch job -Although Nextflow supports SLURM natively, avoid launching -large amounts of very short jobs using SLURM. Instead, one can launch a Nextflow -job as a regular batch job that co-executes all job tasks in the same job -allocation. Below is a minimal example to get started with your Nextflow -pipeline: +To launch a Nextflow job as a regular batch job that executes all job tasks in the same job +allocation, create the batch job file: -```bash +```bash title="nextflow_local_batch_job.sh" #!/bin/bash #SBATCH --time=00:15:00 # Change your runtime settings #SBATCH --partition=test # Change partition as needed @@ -179,58 +174,26 @@ module load nextflow/23.04.3 # Actual Nextflow command here nextflow run workflow.nf -# nf-core pipeline example: nextflow run nf-core/scrnaseq -profile test,singularity -resume --outdir . -``` - -!!! info "Note" - If you are directly pulling multiple images on the fly, please set - `$APPTAINER_TMPDIR` and `$APPTAINER_CACHEDIR` to either local scratch - (i.e. `$LOCAL_SCRATCH`) or to your scratch folder (`/scratch/`) - in the batch script. Otherwise `$HOME` directory, the size of which is - only 10 GB, will be used. To avoid any disk quota errors while pulling - images, set `$APPTAINER_TMPDIR` and `$APPTAINER_CACHEDIR` in your batch - script as below: - - ```bash - export APPTAINER_TMPDIR=$LOCAL_SCRATCH - export APPTAINER_CACHEDIR=$LOCAL_SCRATCH - ``` - - Note that this also requires requesting NVMe disk in the batch script by - adding `#SBATCH --gres=nvme:`. For example, add - `#SBATCH --gres=nvme:100` to request 100 GB of space on `$LOCAL_SCRATCH`. - - -Finally, copy above script to a fle (e.g., nextflow_script.sh) and submit the job to cluster as below: - -``` -sbatch nextflow_script.sh +# nf-core pipeline example: +# nextflow run nf-core/scrnaseq -profile test,singularity -resume --outdir . ``` -Monitor the status of submitted Slurm job +Finally, submit the job to the supercomputer: ``` - squeue -j - # or - squeue --me - # or - squeue -u $USER +sbatch nextflow_local_batch_job.sh ``` -### Running Nextflow with SLURM executor +### Running Nextflow with SLURM executor -One of the advantages of Nextflow is that the actual pipeline functional logic is separated from the execution environment. The same script can therefore be executed in different environments by changing the execution environment without touching actual pipeline code. Nextflow uses `executor` information to decide where the job should be run. Once executor is configured, Nextflow submits each process to the specified job scheduler on your behalf (=you don't need to write sbatch script, Nextflow writes on the fly for you, instead ). - -Default executor is `local` where process is run in your computer/localhost where Nextflow is launched. Other executors include: +The first batch job file reserves resources only for Nextflow itself. Nextflow then creates further SLURM jobs for workflow's processes. The SLURM jobs created by Nextflow may be distributed to several nodes of a supercomputer and also to use different partitions for different workflow rules, for example CPU and GPU. SLURM executor should be used only, if the job steps are at least 20-30 minutes long, otherwise the it could overload SLURM. -- PBS/Torque -- SLURM -- Amazon (AWS Batch) -- SGE (Sun Grid Engine) +!!! warning + Please do not use SLURM executor, if your workflow includes a lot of short processes. It would overload SLURM. -To enable the SLURM executor, simply set `process.executor` property to slurm value in the `nextflow.config` file as shown below: +To enable the SLURM executor, set the `process.xx` settings in [nextflow.config file](https://www.nextflow.io/docs/latest/config.html). The settings are similar to [batch job files](../../computing/running/example-job-scripts-puhti.md). -``` +```bash title="nextflow.config" profiles { @@ -248,25 +211,40 @@ profiles { } ``` -In this case, you can run a Nextflow script as below: +Create the batch job file, note the usage of a profile. + +```bash title="nextflow_slurm_batch_job.sh" +#!/bin/bash +#SBATCH --time=00:15:00 # Change your runtime settings +#SBATCH --partition=test # Change partition as needed +#SBATCH --account= # Add your project name here +#SBATCH --cpus-per-task=1 # Change as needed +#SBATCH --mem-per-cpu=1G # Increase as needed + +# Load Nextflow module +module load nextflow/23.04.3 + +# Actual Nextflow command here +nextflow run workflow.nf -profile puhti +``` + +Finally, submit the job to the supercomputer: ``` -nextflow run -profile puhti +sbatch nextflow_slurm_batch_job.sh ``` -This will submit each process of your job as a separate batch job to Puhti supercomputer. + +This will submit each process of your workflow as a separate batch job to Puhti supercomputer. ### Running Nextflow with HyperQueue executor -In this example, let's use the -[HyperQueue meta-scheduler](../../apps/hyperqueue.md) for executing a Nextflow -pipeline. This executor can be used to scale up analysis across multiple nodes -when needed. However, the executor settings can be complex depending on the pipeline. +[HyperQueue meta-scheduler](../../apps/hyperqueue.md) executer is suitable, if your workflow includes a lot of short processes and you need several nodes for the computation. However, the executor settings can be complex depending on the pipeline. Here is a batch script for running a [nf-core pipeline](https://nf-co.re/pipelines): -```bash +```bash title="nextflow_hyperqueue_batch_job.sh" #!/bin/bash #SBATCH --job-name=nextflowjob #SBATCH --account= @@ -306,7 +284,8 @@ hq worker wait "${SLURM_NTASKS}" git clone https://github.com/nf-core/rnaseq.git -b 3.10 cd rnaseq -# Ensure Nextflow uses the right executor and knows how much it can submit +# Ensure Nextflow uses the right executor and knows how many jobs it can submit +# The `queueSize` can be limited as needed. echo "executor { queueSize = $(( 40*SLURM_NNODES )) name = 'hq' @@ -321,25 +300,15 @@ hq worker stop all hq server stop ``` -!!! info "Note" - Please make sure that your Nextflow configuration file (`nextflow.config`) - has the correct executor name when using the HypeQueue executor. Also, - when multiple nodes are used, ensure that the executor knows how many jobs - it can submit using the parameter `queueSize` under the `executor` block. - The `queueSize` can be limited as needed. Here is an example snippet that - you can use and modify as needed in your `nextflow.config` file: - - ```text - executor { - queueSize = 40*SLURM_NNODES - name = 'hq' - cpus = 40*SLURM_NNODES - } - ``` +Finally, submit the job to the supercomputer: + +``` +sbatch nextflow_hyperqueue_batch_job.sh +``` ## More information * [Official Nextflow documentation](https://www.nextflow.io/docs/latest/index.html) * [CSC's Nextflow documentation](../../apps/nextflow.md) -* [High-throughput Nextflow workflow using HyperQueue](nextflow-hq.md) * [Master thesis by Antoni Gołoś comparing automated workflow approaches on supercomputers](https://urn.fi/URN:NBN:fi:aalto-202406164397) + * [Full code Nextflow example from Antoni Gołoś with 3 different executors for Puhti](https://github.com/antonigoo/LIPHE-processing/tree/nextflow/workflow) From 59114a19d90c14c2c5d2cf8e38d5f2bb416d8296 Mon Sep 17 00:00:00 2001 From: Kylli Ek Date: Fri, 20 Dec 2024 17:38:11 +0200 Subject: [PATCH 17/29] Update nextflow-puhti.md --- docs/support/tutorials/nextflow-puhti.md | 77 +++++++----------------- 1 file changed, 23 insertions(+), 54 deletions(-) diff --git a/docs/support/tutorials/nextflow-puhti.md b/docs/support/tutorials/nextflow-puhti.md index e780b04cc4..d439c0ce8f 100755 --- a/docs/support/tutorials/nextflow-puhti.md +++ b/docs/support/tutorials/nextflow-puhti.md @@ -1,14 +1,9 @@ -# Running Nextflow pipelines on Puhti and Mahti +# Running Nextflow pipelines on supercomputers [Nextflow](https://www.nextflow.io/) is one of the scientific wokrflow managers and provides built-in support for -HPC-friendly containers such as Apptainer (= Singularity). One of the advantages of Nextflow is that the actual pipeline functional logic is separated from the execution environment. The same script can therefore be executed in different environments by changing the execution environment without touching actual pipeline code. Nextflow uses `executor` information to decide where the job should be run. Once executor is configured, Nextflow submits each process to the specified job scheduler on your behalf (=you don't need to write sbatch script, Nextflow writes on the fly for you, instead ). +HPC-friendly containers such as Apptainer (= Singularity). One of the advantages of Nextflow is that the actual pipeline functional logic is separated from the execution environment. The same script can therefore be executed in different environments by changing the execution environment without touching actual pipeline code. Nextflow uses `executor` information to decide where the job should be run. Once executor is configured, Nextflow submits each process to the specified job scheduler on your behalf. -Default executor is `local` where process is run in your computer/localhost where Nextflow is launched. Other executors include: - -- SLURM -- PBS/Torque -- Amazon (AWS Batch) -- SGE (Sun Grid Engine) +Default executor is `local` where processes are run in the computer where Nextflow is launched. Several other [executors](https://www.nextflow.io/docs/latest/executor.html) are supported, to CSC computing environment, best suit SLURM and HyperQueue executors. There are many other high-throughput tools and workflow managers exist for scientific computing and selecting the right tool can sometimes be challenging. Please refer to our [high-throughput computing and workflows page](../../computing/running/throughput.md) to get an overview from the selected list of relevant tools. @@ -16,8 +11,7 @@ There are many other high-throughput tools and workflow managers exist for scien ### Nextflow -Nextflow itself is available as a module on Puhti and Mahti. One can choose the version of the nextflow depending on the requirement of your own pipeline. Please note that the Nextflow version starting from 23.04.3 can only be -used for pipelines built with DSL2 syntax. You can select a lower version for DSL1-compliant pipelines. +Nextflow itself is available as a module on Puhti, Mahti and LUMI. The default version is usually the latest. Choose the version of the Nextflow depending on the requirements of your own pipeline. It is recommended to load Nextflow module with a version, for the reproducibility point of view. Nextflow can be loaded as below: @@ -25,68 +19,44 @@ Nextflow can be loaded as below: module load nextflow/ # e.g., module load nextflow/22.10.1 ``` -Please note that one has to load Nextflow module with a version. Otherwise, the latest version of stable module installed at that point is used. For the reproducibility point of view, make sure to load versions of all tools including the Nextflow module. - -!!! info "Note" - Please make sure to specify the correct version of the Nextflow module as - some pipelines require a specific version of Nextflow. - +Please note that the Nextflow version starting from 23.04.3 can only be used for pipelines built with DSL2 syntax. You can select a older version for DSL1-compliant pipelines. ### Installation of tools used in Nextflow -1. Local installations: By default, Nextflow expects that the analysis tools are available locally. Tools can be activated from existing [modules](../../apps/by_discipline.md) or [own custom module installations](../../computing/modules.md#using-your-own-module-files). +#### Local installations + +By default, Nextflow expects that the analysis tools are available locally. Tools can be activated from existing [modules](../../apps/by_discipline.md) or [own custom module installations](../../computing/modules.md#using-your-own-module-files). -2. Container instalaltions: +#### Container instalaltions Containers can be smoothly integrated with Nextflow pipelines. No additional modifications to Nextflow scripts are needed except enabling the Apptainer engine in the Nextflow configuration -file in the HPC environment. Nextflow can pull remote container images as Apptainer +file. Nextflow can pull remote container images as Apptainer from container registries on the fly. The remote container images are usually specified in the Nextflow script or configuration file by simply prefixing the image name with `shub://` or `docker://`. It is also possible to specify a different Apptainer image for each process definition in the Nextflow pipeline script. -Although Nextflow -allows choosing Docker engine for running -pipelines, please note that Docker containers can't be used on supercomputers due to the -lack of administrative privileges for normal users. - Most Nextflow pipelines pull the needed container images on the fly. However, when multiple images are needed in a pipeline, it is a good idea to prepare the images locally first before launching your Nextflow pipeline. More information about [creating containers](../../computing/containers/creating.md). -!!! info "Note" - Apptainer is installed on login and compute nodes and does - not require loading a separate module on either Puhti, Mahti or LUMI. - - * For binding folders or using other Apptainer flags, use [--apptainer-args option] or delcare in the nextflow.config files. +Practical considerations: +* Apptainer is installed on login and compute nodes and does not require loading a separate module on CSC supercomputers. +* For binding folders or using other [Apptainer settings](https://www.nextflow.io/docs/latest/reference/config.html#apptainer), `nextflow.config` file. +* If you are directly pulling multiple Apptainer images on the fly, please use NVMe disk of a compute node for storing the Apptainer images. For that in your batch job file, first request NVMe disk and then set Apptainer tempory folders as environmental variables. -!!! info "Note" - If you are directly pulling multiple images on the fly, please set - `$APPTAINER_TMPDIR` and `$APPTAINER_CACHEDIR` to either local scratch - (i.e. `$LOCAL_SCRATCH`) or to your scratch folder (`/scratch/`) - in the batch script. Otherwise `$HOME` directory, the size of which is - only 10 GB, will be used. To avoid any disk quota errors while pulling - images, set `$APPTAINER_TMPDIR` and `$APPTAINER_CACHEDIR` in your batch - script as below: - - ```bash - export APPTAINER_TMPDIR=$LOCAL_SCRATCH - export APPTAINER_CACHEDIR=$LOCAL_SCRATCH - ``` - - Note that this also requires requesting NVMe disk in the batch script by - adding `#SBATCH --gres=nvme:`. For example, add - `#SBATCH --gres=nvme:100` to request 100 GB of space on `$LOCAL_SCRATCH`. - -Running Nextflow pipelines can sometimes be quite compute-intensive and may -require downloading large volumes of data such as databases and container -images. This can take a while and may not even work successfully for the first -time when downloading multiple Apptainer images or databases. +```bash +#SBATCH --gres=nvme:100 # Request 100 GB of space to local disk +export APPTAINER_TMPDIR=$LOCAL_SCRATCH +export APPTAINER_CACHEDIR=$LOCAL_SCRATCH +``` +!!! warning + Although Nextflow supports also Docker containers, these can't be used as such on supercomputers due to the lack of administrative privileges for normal users. ## Usage @@ -100,12 +70,11 @@ Nextflow pipelines can be run in different ways in supercomputering environment: !!! info "Note" Please do not launch heavy Nextflow workflows on login nodes. -Please follow our -[instructions for writing a batch job script for Puhti](../../computing/running/example-job-scripts-puhti.md). +For general introduction to batch jobs, see [example job scripts for Puhti](../../computing/running/example-job-scripts-puhti.md). ### Nextflow script -The following minimalist example demonstrates the basic syntax of Nextflow. +The following minimalist example demonstrates the basic syntax of a Nextflow script. ```nextflow title="workflow.nf" #!/usr/bin/env nextflow From 934d61245a06806d83a5ebb6c0a16dab8e9ccb22 Mon Sep 17 00:00:00 2001 From: Kylli Ek Date: Fri, 20 Dec 2024 17:49:33 +0200 Subject: [PATCH 18/29] Update nextflow-puhti.md --- docs/support/tutorials/nextflow-puhti.md | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/docs/support/tutorials/nextflow-puhti.md b/docs/support/tutorials/nextflow-puhti.md index d439c0ce8f..317650cd86 100755 --- a/docs/support/tutorials/nextflow-puhti.md +++ b/docs/support/tutorials/nextflow-puhti.md @@ -5,7 +5,7 @@ HPC-friendly containers such as Apptainer (= Singularity). One of the advantages Default executor is `local` where processes are run in the computer where Nextflow is launched. Several other [executors](https://www.nextflow.io/docs/latest/executor.html) are supported, to CSC computing environment, best suit SLURM and HyperQueue executors. -There are many other high-throughput tools and workflow managers exist for scientific computing and selecting the right tool can sometimes be challenging. Please refer to our [high-throughput computing and workflows page](../../computing/running/throughput.md) to get an overview from the selected list of relevant tools. +There are many other high-throughput tools and workflow managers for scientific computing and selecting the right tool can sometimes be challenging. Please refer to our [high-throughput computing and workflows page](../../computing/running/throughput.md) to get an overview of relevant tools. ## Installation @@ -13,21 +13,22 @@ There are many other high-throughput tools and workflow managers exist for scien Nextflow itself is available as a module on Puhti, Mahti and LUMI. The default version is usually the latest. Choose the version of the Nextflow depending on the requirements of your own pipeline. It is recommended to load Nextflow module with a version, for the reproducibility point of view. -Nextflow can be loaded as below: +To load Nextflow module: ```bash module load nextflow/ # e.g., module load nextflow/22.10.1 ``` -Please note that the Nextflow version starting from 23.04.3 can only be used for pipelines built with DSL2 syntax. You can select a older version for DSL1-compliant pipelines. +!!! warning + The Nextflow 23.04.3 and newer support only pipelines built with DSL2 syntax. Select an older version for DSL1-compliant pipelines. ### Installation of tools used in Nextflow #### Local installations -By default, Nextflow expects that the analysis tools are available locally. Tools can be activated from existing [modules](../../apps/by_discipline.md) or [own custom module installations](../../computing/modules.md#using-your-own-module-files). +By default, Nextflow expects that the analysis tools are available locally. Tools can be activated from existing [modules](../../apps/by_discipline.md) or [own custom module installations](../../computing/modules.md#using-your-own-module-files). See also how to create [create containers](../../computing/containers/creating.md). -#### Container instalaltions +#### On-the-fly Apptainer installations Containers can be smoothly integrated with Nextflow pipelines. No additional modifications to Nextflow scripts are needed except enabling the Apptainer engine in the Nextflow configuration @@ -40,15 +41,15 @@ Nextflow pipeline script. Most Nextflow pipelines pull the needed container images on the fly. However, when multiple images are needed in a pipeline, it is a good idea to prepare the -images locally first before launching your Nextflow pipeline. More information about [creating containers](../../computing/containers/creating.md). - +containers locally before launching the Nextflow pipeline. Practical considerations: + * Apptainer is installed on login and compute nodes and does not require loading a separate module on CSC supercomputers. -* For binding folders or using other [Apptainer settings](https://www.nextflow.io/docs/latest/reference/config.html#apptainer), `nextflow.config` file. +* For binding folders or using other [Apptainer settings](https://www.nextflow.io/docs/latest/reference/config.html#apptainer) use `nextflow.config` file. * If you are directly pulling multiple Apptainer images on the fly, please use NVMe disk of a compute node for storing the Apptainer images. For that in your batch job file, first request NVMe disk and then set Apptainer tempory folders as environmental variables. -```bash +```bash title="batch_job.sh" #SBATCH --gres=nvme:100 # Request 100 GB of space to local disk export APPTAINER_TMPDIR=$LOCAL_SCRATCH From f12afc5dd8504edc5e9a167e573eff426ff28b8e Mon Sep 17 00:00:00 2001 From: Kylli Ek Date: Tue, 7 Jan 2025 14:34:25 +0200 Subject: [PATCH 19/29] Update nextflow.md --- docs/apps/nextflow.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/apps/nextflow.md b/docs/apps/nextflow.md index 6a31e31e5a..3f411efe9f 100644 --- a/docs/apps/nextflow.md +++ b/docs/apps/nextflow.md @@ -39,13 +39,13 @@ Nextflow is released under the module use /appl/local/csc/modulefiles ``` -Nextflow is activated by loading `nextflow` module as below: +Nextflow is activated by loading `nextflow` module: ```bash module load nextflow ``` -Example of loading `nextflow` module with a specific version: +The default version is usually the latest. Choose the version of the Nextflow depending on the requirements of your own pipeline. It is recommended to load Nextflow module with a version, for the reproducibility point of view. To load `nextflow` module with a specific version: ```bash module load nextflow/22.04.5 From f2f65e2fb0397a6b214203c338e662af8e730f61 Mon Sep 17 00:00:00 2001 From: Kylli Ek Date: Tue, 7 Jan 2025 14:41:43 +0200 Subject: [PATCH 20/29] Update nextflow-puhti.md --- docs/support/tutorials/nextflow-puhti.md | 19 +++++-------------- 1 file changed, 5 insertions(+), 14 deletions(-) diff --git a/docs/support/tutorials/nextflow-puhti.md b/docs/support/tutorials/nextflow-puhti.md index 317650cd86..5b139e2482 100755 --- a/docs/support/tutorials/nextflow-puhti.md +++ b/docs/support/tutorials/nextflow-puhti.md @@ -1,6 +1,6 @@ # Running Nextflow pipelines on supercomputers -[Nextflow](https://www.nextflow.io/) is one of the scientific wokrflow managers and provides built-in support for +[Nextflow](https://www.nextflow.io/) is a scientific wokrflow manager and provides built-in support for HPC-friendly containers such as Apptainer (= Singularity). One of the advantages of Nextflow is that the actual pipeline functional logic is separated from the execution environment. The same script can therefore be executed in different environments by changing the execution environment without touching actual pipeline code. Nextflow uses `executor` information to decide where the job should be run. Once executor is configured, Nextflow submits each process to the specified job scheduler on your behalf. Default executor is `local` where processes are run in the computer where Nextflow is launched. Several other [executors](https://www.nextflow.io/docs/latest/executor.html) are supported, to CSC computing environment, best suit SLURM and HyperQueue executors. @@ -11,16 +11,7 @@ There are many other high-throughput tools and workflow managers for scientific ### Nextflow -Nextflow itself is available as a module on Puhti, Mahti and LUMI. The default version is usually the latest. Choose the version of the Nextflow depending on the requirements of your own pipeline. It is recommended to load Nextflow module with a version, for the reproducibility point of view. - -To load Nextflow module: - -```bash -module load nextflow/ # e.g., module load nextflow/22.10.1 -``` - -!!! warning - The Nextflow 23.04.3 and newer support only pipelines built with DSL2 syntax. Select an older version for DSL1-compliant pipelines. +Nextflow itself is available as a module on Puhti, Mahti and LUMI. Specific versions available are described on the [Nextflow main page](../../apps/nextflow.md). ### Installation of tools used in Nextflow @@ -57,7 +48,7 @@ export APPTAINER_CACHEDIR=$LOCAL_SCRATCH ``` !!! warning - Although Nextflow supports also Docker containers, these can't be used as such on supercomputers due to the lack of administrative privileges for normal users. + Although Nextflow supports also Docker containers, these can't be used as such on supercomputers due to the lack of administrative privileges for normal users. ## Usage @@ -119,7 +110,7 @@ executor > local (5) ``` ### Running Nextflow pipeline with local executor interactively -To run Nextflow in [interactive session](https://docs.csc.fi/computing/running/interactive-usage/): +To run Nextflow in [interactive session](../../computing/running/interactive-usage/): ``` sinteractive -c 2 -m 4G -d 250 -A project_2xxxx # replace actual project number here module load nextflow/23.04.3 # Load nextflow module @@ -156,7 +147,7 @@ sbatch nextflow_local_batch_job.sh ### Running Nextflow with SLURM executor -The first batch job file reserves resources only for Nextflow itself. Nextflow then creates further SLURM jobs for workflow's processes. The SLURM jobs created by Nextflow may be distributed to several nodes of a supercomputer and also to use different partitions for different workflow rules, for example CPU and GPU. SLURM executor should be used only, if the job steps are at least 20-30 minutes long, otherwise the it could overload SLURM. +The first batch job file reserves resources only for Nextflow itself. Nextflow then creates further SLURM jobs for workflow's processes. The SLURM jobs created by Nextflow may be distributed to several nodes of a supercomputer and also to use different partitions for different workflow rules, for example CPU and GPU. SLURM executor should be used only, if the job steps are at least 20-30 minutes long, otherwise it may overload SLURM. !!! warning Please do not use SLURM executor, if your workflow includes a lot of short processes. It would overload SLURM. From 7f29705b4d6a13e3bb5f6b5c145c7278be60c100 Mon Sep 17 00:00:00 2001 From: Kylli Ek Date: Tue, 7 Jan 2025 14:53:11 +0200 Subject: [PATCH 21/29] rename to nextflow-tutorial --- .../support/tutorials/{nextflow-puhti.md => nextflow-tutorial.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename docs/support/tutorials/{nextflow-puhti.md => nextflow-tutorial.md} (100%) mode change 100755 => 100644 diff --git a/docs/support/tutorials/nextflow-puhti.md b/docs/support/tutorials/nextflow-tutorial.md old mode 100755 new mode 100644 similarity index 100% rename from docs/support/tutorials/nextflow-puhti.md rename to docs/support/tutorials/nextflow-tutorial.md From d363482ef59d2fc2a30dfe80265a8a70baadb196 Mon Sep 17 00:00:00 2001 From: Kylli Ek Date: Tue, 7 Jan 2025 14:55:11 +0200 Subject: [PATCH 22/29] update tutorial link --- docs/apps/nextflow.md | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/docs/apps/nextflow.md b/docs/apps/nextflow.md index 3f411efe9f..30eb5f8fc0 100644 --- a/docs/apps/nextflow.md +++ b/docs/apps/nextflow.md @@ -58,7 +58,7 @@ nextflow -h ``` More detailed instructions can be found in -[CSC's Nextflow tutorial](../support/tutorials/nextflow-puhti.md). +[CSC's Nextflow tutorial](../support/tutorials/nextflow-tutorial.md). ## References @@ -71,6 +71,4 @@ computational workflows. Nat. Biotechnol. 35, 316–319 (2017). ## More information * [Nextflow official documentation](https://www.nextflow.io/docs/latest/index.html) -* [Running Nextflow on Puhti](../support/tutorials/nextflow-puhti.md) -* [High-throughput Nextflow workflow using HyperQueue](../support/tutorials/nextflow-hq.md) -* [Contact CSC Service Desk for technical support](../support/contact.md) +* [CSC Nextflow tutorial](../support/tutorials/nextflow-tutorial.md) From e396b40c0401e68315f3eccee890df692297529a Mon Sep 17 00:00:00 2001 From: Kylli Ek Date: Tue, 7 Jan 2025 14:56:26 +0200 Subject: [PATCH 23/29] update nextflow tutorial link --- docs/support/tutorials/index.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/docs/support/tutorials/index.md b/docs/support/tutorials/index.md index ae9a18a372..b9d2731c2c 100644 --- a/docs/support/tutorials/index.md +++ b/docs/support/tutorials/index.md @@ -27,9 +27,7 @@ * [HyperQueue meta-scheduler](../../apps/hyperqueue.md) * [FireWorks workflow manager](../../computing/running/fireworks.md) * [How to run many short jobs with GNU Parallel](many.md) -* [Nextflow tutorial for Puhti (basic)](https://yetulaxman.github.io/Biocontainer/tutorials/nextflow_tutorial.html) -* [Running Nextflow pipelines on Puhti (advanced)](nextflow-puhti.md) -* [Running Nextflow workflows using HyperQueue](nextflow-hq.md) +* [Running Nextflow pipelines](nextflow-tutorial.md) * [Running Snakemake pipelines on Puhti](snakemake-puhti.md) ## Allas From e74846e658a34e2fc7a903f167e5b887f17af278 Mon Sep 17 00:00:00 2001 From: Kylli Ek Date: Tue, 7 Jan 2025 14:58:10 +0200 Subject: [PATCH 24/29] update nextflow tutorial link --- docs/computing/running/throughput.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/computing/running/throughput.md b/docs/computing/running/throughput.md index cc34f1f4b3..7221a07857 100644 --- a/docs/computing/running/throughput.md +++ b/docs/computing/running/throughput.md @@ -57,7 +57,7 @@ graph TD C -->|No| E(Single- or multi-node subtasks?) E -->|Single| F(Dependencies between subtasks?) E -->|Multi-node| G(FireWorks) - F -->|Yes| J(Snakemake
Nextflow
FireWorks) + F -->|Yes| J(Snakemake
Nextflow
FireWorks) F -->|No| K(GNU Parallel
Array jobs
HyperQueue) ``` From bf658886974f09382b3785dee27cf5cf8701edbc Mon Sep 17 00:00:00 2001 From: Kylli Ek Date: Tue, 7 Jan 2025 15:40:08 +0200 Subject: [PATCH 25/29] Update nextflow tutorial link --- docs/support/tutorials/nextflow-hq.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/support/tutorials/nextflow-hq.md b/docs/support/tutorials/nextflow-hq.md index ccd79cf4a6..b40b058412 100644 --- a/docs/support/tutorials/nextflow-hq.md +++ b/docs/support/tutorials/nextflow-hq.md @@ -82,8 +82,7 @@ main.nf work ## More information * [General guidelines for high-throughput computing in CSC's HPC environment](../../computing/running/throughput.md) -* [Basic](https://yetulaxman.github.io/Biocontainer/tutorials/nextflow_tutorial.html) - and [advanced Nextflow tutorials for Puhti](nextflow-puhti.md) +* [CSC Nextflow tutorial](nextflow-tutorial.md) * [Official Nextflow documentation](https://www.nextflow.io/docs/latest/index.html) * [Official HyperQueue documentation](https://it4innovations.github.io/hyperqueue/stable/) * [More information in CSC's HyperQueue documentation](../../apps/hyperqueue.md) From f2f91faa746ff382209ceb458301777c20abf129 Mon Sep 17 00:00:00 2001 From: Kylli Ek Date: Tue, 7 Jan 2025 15:41:09 +0200 Subject: [PATCH 26/29] fix link --- docs/support/tutorials/nextflow-tutorial.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/support/tutorials/nextflow-tutorial.md b/docs/support/tutorials/nextflow-tutorial.md index 5b139e2482..8f4546c4f3 100644 --- a/docs/support/tutorials/nextflow-tutorial.md +++ b/docs/support/tutorials/nextflow-tutorial.md @@ -110,7 +110,7 @@ executor > local (5) ``` ### Running Nextflow pipeline with local executor interactively -To run Nextflow in [interactive session](../../computing/running/interactive-usage/): +To run Nextflow in [interactive session](../../computing/running/interactive-usage.md): ``` sinteractive -c 2 -m 4G -d 250 -A project_2xxxx # replace actual project number here module load nextflow/23.04.3 # Load nextflow module From f283c232f3b0a48fff1f1bd6cf32f71c5ef833b0 Mon Sep 17 00:00:00 2001 From: Kylli Ek Date: Tue, 7 Jan 2025 15:47:49 +0200 Subject: [PATCH 27/29] add HQ links --- docs/support/tutorials/nextflow-tutorial.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/docs/support/tutorials/nextflow-tutorial.md b/docs/support/tutorials/nextflow-tutorial.md index 8f4546c4f3..8c0226933f 100644 --- a/docs/support/tutorials/nextflow-tutorial.md +++ b/docs/support/tutorials/nextflow-tutorial.md @@ -147,6 +147,8 @@ sbatch nextflow_local_batch_job.sh ### Running Nextflow with SLURM executor +If the workflow includes only limited number of individual jobs/job steps [Slurm executor of Nextflow](https://www.nextflow.io/docs/latest/executor.html#slurm) could be considered. + The first batch job file reserves resources only for Nextflow itself. Nextflow then creates further SLURM jobs for workflow's processes. The SLURM jobs created by Nextflow may be distributed to several nodes of a supercomputer and also to use different partitions for different workflow rules, for example CPU and GPU. SLURM executor should be used only, if the job steps are at least 20-30 minutes long, otherwise it may overload SLURM. !!! warning @@ -202,6 +204,10 @@ This will submit each process of your workflow as a separate batch job to Puhti [HyperQueue meta-scheduler](../../apps/hyperqueue.md) executer is suitable, if your workflow includes a lot of short processes and you need several nodes for the computation. However, the executor settings can be complex depending on the pipeline. +!!! Note + Whenever you're unsure how to run your workflow efficiently, don't hesitate + to [contact CSC Service Desk](../contact.md). + Here is a batch script for running a [nf-core pipeline](https://nf-co.re/pipelines): @@ -273,3 +279,7 @@ sbatch nextflow_hyperqueue_batch_job.sh * [CSC's Nextflow documentation](../../apps/nextflow.md) * [Master thesis by Antoni Gołoś comparing automated workflow approaches on supercomputers](https://urn.fi/URN:NBN:fi:aalto-202406164397) * [Full code Nextflow example from Antoni Gołoś with 3 different executors for Puhti](https://github.com/antonigoo/LIPHE-processing/tree/nextflow/workflow) +* [General guidelines for high-throughput computing in CSC's HPC environment](../../computing/running/throughput.md) +* [Official HyperQueue documentation](https://it4innovations.github.io/hyperqueue/stable/) +* [CSC's HyperQueue documentation](../../apps/hyperqueue.md) + From fd12fa57b583753b2800fce2a6cbc59095c9855f Mon Sep 17 00:00:00 2001 From: Kylli Ek Date: Tue, 7 Jan 2025 15:55:25 +0200 Subject: [PATCH 28/29] Update nextflow-tutorial.md --- docs/support/tutorials/nextflow-tutorial.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/support/tutorials/nextflow-tutorial.md b/docs/support/tutorials/nextflow-tutorial.md index 8c0226933f..6cdb319ca1 100644 --- a/docs/support/tutorials/nextflow-tutorial.md +++ b/docs/support/tutorials/nextflow-tutorial.md @@ -59,11 +59,12 @@ Nextflow pipelines can be run in different ways in supercomputering environment: 3. With batch job and SLURM executor. This can use multiple nodes and different SLURM partitions (CPU and GPU), but may create significant overhead, with many small jobs. Could be used, if each job step for each file takes at least 30 min. 4. With batch job and HyperQueue as a sub-job scheduler. Can use multiple nodes in the same batch job allocation, most complex set up. Suits well for cases, when workflow includes a lot of small job steps with many input files (high-troughput computing). -!!! info "Note" - Please do not launch heavy Nextflow workflows on login nodes. - For general introduction to batch jobs, see [example job scripts for Puhti](../../computing/running/example-job-scripts-puhti.md). +!!! Note + Whenever you're unsure how to run your workflow efficiently, don't hesitate + to [contact CSC Service Desk](../contact.md). + ### Nextflow script The following minimalist example demonstrates the basic syntax of a Nextflow script. @@ -117,6 +118,9 @@ module load nextflow/23.04.3 # Load nextflow module nextflow run workflow.nf ``` +!!! info "Note" + Please do not launch heavy Nextflow workflows on login nodes. + ### Running Nextflow with local executor in a batch job To launch a Nextflow job as a regular batch job that executes all job tasks in the same job @@ -204,10 +208,6 @@ This will submit each process of your workflow as a separate batch job to Puhti [HyperQueue meta-scheduler](../../apps/hyperqueue.md) executer is suitable, if your workflow includes a lot of short processes and you need several nodes for the computation. However, the executor settings can be complex depending on the pipeline. -!!! Note - Whenever you're unsure how to run your workflow efficiently, don't hesitate - to [contact CSC Service Desk](../contact.md). - Here is a batch script for running a [nf-core pipeline](https://nf-co.re/pipelines): From 8907204fcd812ac85c8c596c6c92a9338f4e3638 Mon Sep 17 00:00:00 2001 From: EetuHuuskoCSC <116141296+EetuHuuskoCSC@users.noreply.github.com> Date: Wed, 15 Jan 2025 13:21:01 +0200 Subject: [PATCH 29/29] Update nextflow-tutorial.md --- docs/support/tutorials/nextflow-tutorial.md | 24 +++++++++++---------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/docs/support/tutorials/nextflow-tutorial.md b/docs/support/tutorials/nextflow-tutorial.md index 6cdb319ca1..aade572d4d 100644 --- a/docs/support/tutorials/nextflow-tutorial.md +++ b/docs/support/tutorials/nextflow-tutorial.md @@ -1,9 +1,9 @@ # Running Nextflow pipelines on supercomputers -[Nextflow](https://www.nextflow.io/) is a scientific wokrflow manager and provides built-in support for +[Nextflow](https://www.nextflow.io/) is a scientific workflow manager and provides built-in support for HPC-friendly containers such as Apptainer (= Singularity). One of the advantages of Nextflow is that the actual pipeline functional logic is separated from the execution environment. The same script can therefore be executed in different environments by changing the execution environment without touching actual pipeline code. Nextflow uses `executor` information to decide where the job should be run. Once executor is configured, Nextflow submits each process to the specified job scheduler on your behalf. -Default executor is `local` where processes are run in the computer where Nextflow is launched. Several other [executors](https://www.nextflow.io/docs/latest/executor.html) are supported, to CSC computing environment, best suit SLURM and HyperQueue executors. +Default executor is `local` where processes are run in the computer where Nextflow is launched. Several other [executors](https://www.nextflow.io/docs/latest/executor.html) are supported, the CSC computing environments best suit SLURM and HyperQueue executors. There are many other high-throughput tools and workflow managers for scientific computing and selecting the right tool can sometimes be challenging. Please refer to our [high-throughput computing and workflows page](../../computing/running/throughput.md) to get an overview of relevant tools. @@ -17,11 +17,12 @@ Nextflow itself is available as a module on Puhti, Mahti and LUMI. Specific vers #### Local installations -By default, Nextflow expects that the analysis tools are available locally. Tools can be activated from existing [modules](../../apps/by_discipline.md) or [own custom module installations](../../computing/modules.md#using-your-own-module-files). See also how to create [create containers](../../computing/containers/creating.md). +By default, Nextflow expects that the analysis tools are available locally. Tools can be activated from existing [modules](../../apps/by_discipline.md) or [own custom module installations](../../computing/modules.md#using-your-own-module-files). See also how to [create containers](../../computing/containers/creating.md). #### On-the-fly Apptainer installations + Containers can be smoothly integrated with Nextflow pipelines. No additional -modifications to Nextflow scripts are needed except enabling the +modifications to Nextflow scripts are needed except for enabling the Apptainer engine in the Nextflow configuration file. Nextflow can pull remote container images as Apptainer from container registries on the fly. The remote container images are @@ -38,7 +39,7 @@ Practical considerations: * Apptainer is installed on login and compute nodes and does not require loading a separate module on CSC supercomputers. * For binding folders or using other [Apptainer settings](https://www.nextflow.io/docs/latest/reference/config.html#apptainer) use `nextflow.config` file. -* If you are directly pulling multiple Apptainer images on the fly, please use NVMe disk of a compute node for storing the Apptainer images. For that in your batch job file, first request NVMe disk and then set Apptainer tempory folders as environmental variables. +* If you are directly pulling multiple Apptainer images on the fly, please use the NVMe disk of a compute node for storing the Apptainer images. For that in your batch job file, first request NVMe disk space and then set Apptainer temporary folders as environmental variables. ```bash title="batch_job.sh" #SBATCH --gres=nvme:100 # Request 100 GB of space to local disk @@ -52,12 +53,12 @@ export APPTAINER_CACHEDIR=$LOCAL_SCRATCH ## Usage -Nextflow pipelines can be run in different ways in supercomputering environment: +Nextflow pipelines can be run in different ways in the supercomputer environment: 1. [In interactive mode](../../computing/running/interactive-usage.md) with local executor, with limited resources. Useful mainly for debugging or testing very small workflows. -2. With batch job and local executor. Useful for small and medium size workflows +2. With batch job and local executor. Useful for small and medium size workflows. 3. With batch job and SLURM executor. This can use multiple nodes and different SLURM partitions (CPU and GPU), but may create significant overhead, with many small jobs. Could be used, if each job step for each file takes at least 30 min. -4. With batch job and HyperQueue as a sub-job scheduler. Can use multiple nodes in the same batch job allocation, most complex set up. Suits well for cases, when workflow includes a lot of small job steps with many input files (high-troughput computing). +4. With batch job and HyperQueue as a sub-job scheduler. Can use multiple nodes in the same batch job allocation, most complex set up. Well-suited for cases, when the workflow includes a lot of small job steps with many input files (high-troughput computing). For general introduction to batch jobs, see [example job scripts for Puhti](../../computing/running/example-job-scripts-puhti.md). @@ -94,7 +95,7 @@ process sayHello { workflow { - // Print a greeting + // Print a greeting sayHello(greets) } @@ -111,6 +112,7 @@ executor > local (5) ``` ### Running Nextflow pipeline with local executor interactively + To run Nextflow in [interactive session](../../computing/running/interactive-usage.md): ``` sinteractive -c 2 -m 4G -d 250 -A project_2xxxx # replace actual project number here @@ -151,7 +153,7 @@ sbatch nextflow_local_batch_job.sh ### Running Nextflow with SLURM executor -If the workflow includes only limited number of individual jobs/job steps [Slurm executor of Nextflow](https://www.nextflow.io/docs/latest/executor.html#slurm) could be considered. +If the workflow includes only limited number of individual jobs/job steps [SLURM executor of Nextflow](https://www.nextflow.io/docs/latest/executor.html#slurm) could be considered. The first batch job file reserves resources only for Nextflow itself. Nextflow then creates further SLURM jobs for workflow's processes. The SLURM jobs created by Nextflow may be distributed to several nodes of a supercomputer and also to use different partitions for different workflow rules, for example CPU and GPU. SLURM executor should be used only, if the job steps are at least 20-30 minutes long, otherwise it may overload SLURM. @@ -185,7 +187,7 @@ Create the batch job file, note the usage of a profile. #SBATCH --time=00:15:00 # Change your runtime settings #SBATCH --partition=test # Change partition as needed #SBATCH --account= # Add your project name here -#SBATCH --cpus-per-task=1 # Change as needed +#SBATCH --cpus-per-task=1 # Change as needed #SBATCH --mem-per-cpu=1G # Increase as needed # Load Nextflow module