Skip to content

Latest commit

 

History

History
157 lines (118 loc) · 9.41 KB

README.md

File metadata and controls

157 lines (118 loc) · 9.41 KB
title author date
Shell for Bioinformatics
Sheldon McKay, Mary Piper, Radhika Khetani, Meeta Mistry, Jihe Liu
September 28, 2020

Workshop Schedule

Day 1

Time Topic Instructor
9:30 - 10:10 Workshop introduction Will
10:10 - 11:40 Introduction to Shell Upen
11:40 - 12:00 Overview of self-learning materials and homework submission Will

Before the next class:

I. Please study the contents and work through all the code within the following lessons:

  1. Wildcards and shortcuts in Shell

    Click here for a preview of this lesson
    Perhaps you are interested in only listing the files that have a .txt extension or you want to navigate to your home directory quickly. There are many shortcuts in Shell that will help you do these types of tasks.

    This lesson will cover:
    - Utilizing wildcards for selecting multiple files
    - Implementing shortcuts for moving around the Shell quickly

  2. Examining and creating files

    Click here for a preview of this lesson
    Now that you can navigate around the Shell environment, you are likely interested to know how to view and edit your files.

    This lesson will cover:
    - Viewing your files
    - Editing your files using vim

  3. Searching and redirection

    Click here for a preview of this lesson
    You will encounter large files that need a search function to find the information you are looking for. You might also be interested in writing the output of that search to a file or use it as the input to another function.

    This lesson will cover:
    - Searching files using grep
    - Writing the output of a command to a file
    - Redirecting the output of a command to an additional command

NOTE: To run through the code above, you will need to be logged into O2 and working on a compute node (i.e. your command prompt should have the word compute in it).

  1. Log in using ssh [email protected] and enter your password (replace the "XX" in the username with the number you were assigned in class).
  2. Once you are on the login node, use srun --pty -p interactive -t 0-2:30 --mem 1G /bin/bash to get on a compute node.
  3. Proceed once your command prompt has the word compute in it.
  4. If you log out between lessons (using the exit command twice), please follow points 1. and 2. above to log back in and get on a compute node when you restart with the self learning.

II. Complete the exercises:

  • Each lesson above contains exercises; please go through each of them.
  • Copy over your solutions into the Google Form the day before the next class.

Questions?

  • If you get stuck due to an error while runnning code in the lesson, email us

Day 2

Time Topic Instructor
09:30 - 10:10 Self-learning lessons review All
10:10 - 10:55 Shell scripts and variables in Shell Upen
10:55 - 12:00 Loops and automation Will

Before the next class:

I. Please study the contents and work through all the code within the following lessons:

  1. Permissions and Environment Variables

    Click here for a preview of this lesson
    When using a multi-user system like the O2 cluster, you may want to limit access to your work. Permissions exist to clearly delineate who has the ability to read, write and execute your files.

    Also, when working in a UNIX system, there are a core set of default variables that control the behavior of your command-line. One of the most important of these is the $PATH variable, which tells the system where to look for commands that you give it.

    This lesson will cover:
    - Interpreting and modifying existing permissions
    - Querying environmental variables
    - Reading and appending to the $PATH variable

  2. Introduction to High-performance computing

    Click here for a preview of this lesson
    Now that you had a chance to explore the O2 cluster, let's focus on the components of this system, how it is different than your personal computer and the advantages that it offers in terms of parallelization.

    This lesson will cover:
    - Differentiating a high-performance computing cluster like O2 from your personal computer
    - Discuss the large parallelization advantage that O2 has over a personal computer

NOTE: To run through the code above, you will need to be logged into O2 and working on a compute node (i.e. your command prompt should have the word compute in it). For login instructions, please see above.

II. Complete the exercises:

  • Each lesson above contains exercises; please go through each of them.
  • Copy over your solutions into the Google Form the day before the next class.

Questions?

  • If you get stuck due to an error while runnning code in the lesson, email us

Day 3

Time Topic Instructor
9:30 - 10:00 Self-learning lessons review All
10:00 - 11:00 Introduction to the O2 cluster Will
11:00 - 11:30 Exercise (answer key) Upen
11:30 - 11:45 Introduction to the O2 cluster - data storage Will
11:45 - 12:00 Wrap up Upen

Dataset

Introduction to Shell: Dataset

Answer keys

Advanced bash commands

If you are interested in learning some more advanced tools for working on the command-line, we encourage you to walk-through the materials linked below:

Resources

Cheat sheets:

Online tutorials:

General help:

  • Google it! - if you don't know how to do something, try Googling it, other people have probably had the same question.
  • Learn by doing! There's no real other way to learn this than by trying it out.
    • Use vim on your laptop
    • Move around the directory structure on your laptop using the Terminal/Shell counts
    • Open folders and files using the command open
    • Automate something you don't really need to automate
  • Use man bash to get more information about bash (bourne-again shell)

These materials have been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.