Skip to content

Latest commit

 

History

History
110 lines (75 loc) · 6.18 KB

README.md

File metadata and controls

110 lines (75 loc) · 6.18 KB

Week 7

RNAseq Part 1

RNAseq Course Setup

For the BFX Workshop, we will not be using AWS Cloud. Instead, we will use a Docker image created from the AWS AMI used in rnabio.org.

Docker Setup

A Docker image is available through the DockerHub repository- https://hub.docker.com/layers/griffithlab/rnabio/0.0.1/images/sha256-b13f5e9048941c8be3e83555295c0f4ed21645d5fd9bae4226e6bc4f30f54b52?context=explore

  1. Ensure that Docker Desktop is running.

  2. This command will pull the image rnabio to your local Docker client with the tag 0.0.1 from the griffithlab DockerHub repository:

docker pull griffithlab/rnabio:0.0.1
  1. Setup a local workspace directory for the RNAseq course. If you change the path or command used in this Step, please update the path to the workspace directory accordingly in Step 4. Also, make a file test_my_docker_mount that we will look for later.
mkdir -p bfx-workshop/rnabio-workspace
echo 'this file helps me test my docker mount' >> bfx-workshop/rnabio-workspace/test_my_docker_mount.txt
  1. Enter the directory where you created the rnabio-workspace folder, and initialize a Docker container using the image we pulled above. -v tells Docker to mount our workspace directory within the Docker container as /workspace with read-write priveleges. You'll see in the RNAseq course /workspace is the base directory for nearly all commands and steps.
cd bfx-workshop/rnabio-workspace
docker run -v $PWD/:/workspace:rw -it griffithlab/rnabio:0.0.1 /bin/bash
  1. Use ls to see what's in your current directory, then enter the workspace folder and use ls again to see what is in the workspace folder.
ls
cd workspace
ls

User Setup

Now that we are running a Docker container, Docker, by default, will log you in as the "root" user. We need to run as the ubuntu user to match the RNAseq course tutorials.

  1. Switch User su to the ubuntu user:
su ubuntu
  1. Source the pre-installed .bashrc file to configure your environment to match the RNAseq course:
source ~/.bashrc

NOTE: Using Docker and the persistent "workspace" volume we attached will allow you to start/stop as you wish. EVERY TIME YOU LOGIN TO THE DOCKER CONTAINER, YOU MUST LOGIN AS THE ubuntu USER AND source ~/.bashrc UPON EACH LOGIN.

Environment Setup

Create a working directory and set the ‘RNA_HOME’ environment variable

mkdir -p ~/workspace/rnaseq/

export RNA_HOME=~/workspace/rnaseq

Make sure whatever the working dir is, that it is set and is valid

echo $RNA_HOME

Since all the environment variables we set up for the RNA-seq workshop start with ‘RNA’ we can easily view them all by combined use of the env and grep commands as shown below. The env command shows all environment variables currently defined and the grep command identifies string matches.

env | grep RNA

In order to view the contents of this file, you can type:

less ~/.bashrc

To exit less, type q.

Known Issues/ Discrepancies from RNAbio website

  1. When running the check strandedness tool in the Module 1, RNAseq Data section, the docker run command cannot be run from within your griffithlab/rnabio:0.0.1 docker session. To run it, we suggest that you open a new terminal window, cd into the rnaseq directory you created at the beginning of this assignment, and use the following command instead:
docker run -v $PWD/:/docker_workspace mgibio/checkstrandedness:latest check_strandedness --gtf /docker_workspace/refs/chr22_with_ERCC92_tidy.gtf --transcripts /docker_workspace/refs/chr22_ERCC92_transcripts.clean.fa --reads_1 /docker_workspace/data/HBR_Rep1_ERCC-Mix2_Build37-ErccTranscripts-chr22.read1.fastq.gz --reads_2 /docker_workspace/data/HBR_Rep1_ERCC-Mix2_Build37-ErccTranscripts-chr22.read2.fastq.gz

This is the same command as what is mentioned in the course webpage, except that instead of mounting (-v flag) /home/ubuntu/workspace/rnaseq to the docker image- which is where the data was stored for students running through the course on an AMI, you will instead mount whatever your current directory is. Also, this is different from an interactive session where we are able to enter the docker and run commands within it. Instead we are executing our command directly all in that one line of code.

  1. In various parts of RNAbio, in order to view HTML files, plots, etc., the tutorial suggests going to a public IPV4 address link in your browser window. That is only needed for the AMI. Since you'll be running everything locally, you can either find the files in your Finder window or File Explorer and open them directly; or even better, use open [your_file.html] on Mac and explorer.exe [your_file.html] on Windows/WSL2 to open the file in your default browser!

  2. In Pre-alignment QC, an optional QC analysis is running fastp. This software is not available in your docker, so please skip it (the fastqc and multiqc analysis should still work and can be used for analysis). Similarly, you can also skip the adapter trim step as the data provided here does not actually need to be adapter trimmed (however the code is available if you need to do it for your own data)

  3. geneBody_coverage.py in the optional RSeQC section is not correctly in the PATH. Use the full path to the python script /home/ubuntu/.local/bin/geneBody_coverage.py

Homework Assignments

  1. Finish Module 1 - Inputs
  2. Complete Module 2 - Alignment

For-credit students: please count the number of lines in the merged UHR.bam and HBR.bam files and send to Jenny along with an IGV screenshot with the UHR and HBR merged BAM files at the following location on chromosome 22: chr22:40,363,200-40,367,500.