-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathassignment4
20 lines (15 loc) · 1.21 KB
/
assignment4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
TE annotation is a tedious iterative process. Here is the general workflow:
1. Generate a library of putative TEs (RepeatModeler) --> generates a .fas file.
2. Blast that library against your genome (Blast) --> generates a .out file.
3. Use the blast output to get extract hits from the genome (extractAlignTEs.pl) --> generates a lot of .fas files. Some of the .fas files from extractAlignTEs are .muscle.fas, aka aligned files with a CONSENSUS that has been generated by the person doing the work.
5. Collect all CONSENSUS sequences from each individual .muscle.fas file into a single file and degap them to be re-blasted by repeating step 2.
Automating as much as possible will save weeks of work for upcoming projects.
This week's task:
Use python to:
1. run blast: blastn –query <TE library> -db <genome> -out <arbitrary name of outfile> -outfmt 6 (is to make it tab delimited)
2. run extractAlignTEs.pl: extractAlignTEs.pl –genome <genome> –blast <blast output file> –consTEs <TE library> --seqBuffer <# of bps flanking TE> --seqNum <# of closest blast hits> --align
Starting files are as follows:
cPer_rn.fa.gz
extractAlignTEs.pl
starting_library.fas
Can be found at: /lustre/work/daray/extractAlign_assignment