Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pangenome: RNA-seq Pipeline Requirements #1055

Open
jasonwalker80 opened this issue Nov 3, 2021 · 1 comment
Open

Pangenome: RNA-seq Pipeline Requirements #1055

jasonwalker80 opened this issue Nov 3, 2021 · 1 comment
Assignees

Comments

@jasonwalker80
Copy link
Member

Determine the requirements and potential options for the basis of the Pangenome Annotation RNA-seq pipeline.

@chad388
Copy link
Contributor

chad388 commented Nov 3, 2021

The pipeline for evaluation of illumina RNA-seq data that Xiouyu expressed interest in is the Nextflow based nf-core/rnaseq pipeline available in this repo (https://github.com/nf-core/rnaseq)

This software is designed to use Nextflow and to run on a container (Docker in our case) for maximum reproducibility. Conda can also be used to setup the software dependencies, but this approach is discouraged.

Quick Start
-Install Nextflow (>=21.04.0)
-Install any of Docker, Singularity, Podman, Shifter or Charliecloud for full pipeline reproducibility (please only use Conda as a last resort; see docs). Note: This pipeline does not currently support running with Conda on macOS if the --remove_ribo_rna parameter is used because the latest version of the SortMeRNA package is not available for this platform.

Download the pipeline and test it on a minimal dataset with a single command:
nextflow run nf-core/rnaseq -profile test,<docker/singularity/podman/shifter/charliecloud/conda/institute>

There is a user-friendly GUI available at the URL below for specification of parameters to use when running the pipeline:
https://nf-co.re/launch?id=1635975833_2626a88494eb

A custom configuration file can be created to define generic settings our compute environment
https://github.com/nf-core/configs#using-an-existing-config

These customer configuration files can be hosted within the nf-core github repo. Some example of customer configuration files used by other institutions can be found on this page:
https://github.com/nf-core/configs/tree/master/conf

We discussed potentially setting up this pipeline to run, as is, using Nextflow, and then potentially converting this entire pipeline or components of this pipeline into a WDL custom pipeline.

Other RNA-seq pipelines, which we could potentially use either whole or in part, are below. These pipelines are already in wdl or cwl format:

The MGI rnaseq pipeline:
https://github.com/genome/analysis-workflows/blob/master/definitions/pipelines/rnaseq.cwl

The ENCODE-DCC/rna-seq-pipeline:
https://github.com/ENCODE-DCC/rna-seq-pipeline/blob/dev/rna-seq-pipeline.wdl

Xiaoyu seems to favor the use of the nf-core/rnaseq pipeline. He prefers the use of Trim Galore for adapter trimming. We could likely package and incorporate Trim Galore into one of these other pipelines as well.

We can discuss the best approach, but we likely need to get started on something soon.

I feel that setting up and testing the nf-core/rnaseq pipeline might be the best initial approach and then perhaps we can develop a custom workflow based upon some of the components from this pipeline as well as components from the MGI rnaseq and ENCODE-DCC pipelines, but I am open to other ideas to make this easier. I can certainly go back to Xiaoyu and ask him if he is open to using one of these other two pipelines that are already in cwl or wdl format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants