Skip to content

Commit

Permalink
content: add draft for all session
Browse files Browse the repository at this point in the history
  • Loading branch information
matinnuhamunada committed Mar 28, 2022
1 parent 191e62c commit 6e624de
Show file tree
Hide file tree
Showing 6 changed files with 178 additions and 4 deletions.
4 changes: 2 additions & 2 deletions _episodes/01-introduction.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: "Introduction"
teaching: 15
exercises: 15
teaching: 20
exercises: 30
questions:
- "How do I start working in Galaxy?"
- "What does my raw sequencing data looks like?"
Expand Down
4 changes: 2 additions & 2 deletions _episodes/02-filtering_and_assembly.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: "Filtering and Assembly"
teaching: 15
exercises: 15
teaching: 20
exercises: 30
questions:
- "How do I assemble my genome?"
- "How do I asses my assembly result?"
Expand Down
44 changes: 44 additions & 0 deletions _episodes/03-taxonomic_placement.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
---
title: "Taxonomic Placement"
teaching: 10
exercises: 10
questions:
- "What is the taxonomic classification of my genome?"
objectives:
- "Assigning taxonomic classiffication to a newly assembled genomes using available tools."
keypoints:
- "Taxonomic placement of a newly assembled genome can be achieved by calculating nearest reference organism and placing the query genome into existing tree in the database. Such example of tools are autoMLST and GTDB."
---
## Downloading the assembled sequence
We now have an assembled genome that we can analyze. Before moving forward, we would like to determine the lineage or taxonomic classification of this newly assembled genome. For that, we will need to first download the genome from galaxy.

1. Find the `medaka consensus pipeline` - `Consensus` result and click the `Download` icon (first from left) to save it to your computer. **Hint:** The file should be in `.fasta` format.

## Taxonomic Placement with autoMLST
We will use [autoMLST](https://automlst.ziemertlab.com/analyze) to get an overview which gives us the most similar organisms. Unfortunately, it is not available in the Galaxy server, so we must do the analysis in the autoMLST webserver.

1. Go to the [autoMLST webserver](https://automlst.ziemertlab.com/analyze)
2. Choose the `Placement (Fast) mode`
3. Upload your genome sequence
4. Enter your email to get your result back
5. Click `Submit job`

## Other tools to consider
### RefSeq_Masher
> Find what NCBI RefSeq genomes match or are contained within your sequence data using Mash MinHash with a Mash sketch database of 54,925 NCBI RefSeq Genomes.
### GTDB-tk
> a toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes.
> ## Discussion 01
>
>
> > ## Solution
> >
> > TBD
> {: .solution}
>
{: .challenge}

{% include links.md %}

66 changes: 66 additions & 0 deletions _episodes/04-gene_calling_annotation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
---
title: "Genome Annotation"
teaching: 10
exercises: 30
questions:
- "What genes are contained in my genome?"
objectives:
- "Assigning gene calls to sequence and annotate them with the known databases"
keypoints:
- "Genome annotation starts by identifying genes and other functional elements (rRNA, tRNA, etc.) within the nucleotides. This is followed by comparison with databases of interest to predict the functions encoded in the genes."
---
## Importing and editing Annotation workflow
Import genome annotation workflow:
1. In a new tab, open this link: [https://usegalaxy.eu/u/matinnu/w/03annotation](https://usegalaxy.eu/u/matinnu/w/03annotation){:target="_blank"}
2. On the top-right corner, click the `+` button (import workflow)
3. Click `start` using this workflow
4. Click on the name of the newly imported workflow (`imported: 03_Annotation`) and click `Edit`
5. Explore the workflow and make changes if necessary

When you are contempt with the workflow, click save. Next, we will run the analysis on the newly assembled genome (sample_02)
1. Click the _"play"_ button on the top-right corner (`Run Workflow`)
2. If the workflow does not show in detailed view, click `Expand to full workflow form`
3. Set `Send results to a new history` to `yes`.
4. Change the history name to _"03_Annotation_2021_sample02"_.
5. On the `1: input dataset`, select `medaka consensus` as the input.
6. Click `Run Workflow` on the top panel.

## Prokka: rapid prokaryotic genome annotation

> ## Discussion 01 - Prokka
>
>
> > ## Solution
> >
> > TBD
> {: .solution}
>
{: .challenge}

## antiSMASH:

> ## Discussion 02 - AntiSMASH
>
>
> > ## Solution
> >
> > TBD
> {: .solution}
>
{: .challenge}

## Abricate
ABRicate is a tool for the detection of antimicrobial and virulence genes. It is also available on Galaxy, so you don’t have to download your assembly for this. It uses different databases for example CARD to detect virulence genes in the genome. For more information: https://github.com/tseemann/abricate.
See below an example of default parameters with CARD database selected, an explanation of the output table and an example output.

> ## Discussion 02 - AntiSMASH
>
>
> > ## Solution
> >
> > TBD
> {: .solution}
>
{: .challenge}

{% include links.md %}
51 changes: 51 additions & 0 deletions _episodes/05-genome_analysis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
---
title: "Tools for analysing genomes"
teaching: 15
exercises: 45
questions:
- "How do I analyse my genome?"
objectives:
- "Assigning gene calls to sequence and annotate them with the known databases"
keypoints:
- "Genome annotation starts by identifying genes and other functional elements (rRNA, tRNA, etc.) within the nucleotides. This is followed by comparison with databases of interest to predict the functions encoded in the genes."
---
The next steps are then to actually analyse the assembled genome. Dependent on your research questions and what you want to investigate, is a range of different tools that can help with the analysis. Here is a list of suggestions that might be of interest to you. You don’t have to do them all, choose the ones you find suitable to answer your research questions.

## antiSMASH:
AnitSMASH stands for Antibiotics and Secondary Metabolites Analysis Shell and is used for the detection of secondary metabolites. To use AntiSMASH you first have to download your polished genome assembly from galaxy and then upload it to the AntiSMASH website. (https://antismash.secondarymetabolites.org/#!/start). After the it has finished the analysis job you will first be presented with a list of all detected secondary metabolite gene clusters, denoted as Region 1, Region 2 etc (labelled as number 5 in image). It will also tell you for each gene cluster the most similar known cluster detected in the MiBiG database, and the Similarity to that cluster. The Similarity is the percentage of genes within most similar known compound that have a significant BLAST hit to genes within the current region. So, the Similarity is NOT the same as the percentage identity that is an output of a Blast search.
For more information on each Region, click on the number in the top (labeled as 2 in image).
If you have experimental data on for example the bioactivity of your strain, try to match that to the potential compounds detected by AntiSMASH.
For more information:
- https://docs.antismash.secondarymetabolites.org/understanding_output
- https://docs.antismash.secondarymetabolites.org/
- https://academic.oup.com/nar/article/39/suppl_2/W339/2507123
- https://academic.oup.com/nar/article/47/W1/W81/5481154

> ## Discussion 02 - AntiSMASH
>
>
> > ## Solution
> >
> > TBD
> {: .solution}
>
{: .challenge}

## Abricate
ABRicate is a tool for the detection of antimicrobial and virulence genes. It is also available on Galaxy, so you don’t have to download your assembly for this. It uses different databases for example CARD to detect virulence genes in the genome. For more information: https://github.com/tseemann/abricate.
See below an example of default parameters with CARD database selected, an explanation of the output table and an example output.

> ## Discussion 02 - AntiSMASH
>
>
> > ## Solution
> >
> > TBD
> {: .solution}
>
{: .challenge}

## Blast
Lastly, you can also simply blast genes of interest against your assembled genome. For that you will first have to download the nucleotide or amino acid sequence of your gene of interest from NCBI (https://www.ncbi.nlm.nih.gov/). Then you can use nucleotide Blast or tblastn (dependent if you have the nt or aa of your gene of interest). Use the option align two or more sequences. Below an example for finding a gene of interest in your assembled genome.

{% include links.md %}
13 changes: 13 additions & 0 deletions _episodes/06-Extra.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
title: "Extra"
teaching: 15
exercises: 0
questions:
- "Using Command Line Tools for Bioinformatic Analysis"
objectives:
- "Assigning gene calls to sequence and annotate them with the known databases"
keypoints:
- "Genome annotation starts by identifying genes and other functional elements (rRNA, tRNA, etc.) within the nucleotides. This is followed by comparison with databases of interest to predict the functions encoded in the genes."
---

{% include links.md %}

0 comments on commit 6e624de

Please sign in to comment.