Genomes contain a large number of repeated elements and regions bearing similarity. When mapping genomics reads it is important to bear in mind whether reads fall into these regionss and possibly ignore such regions or to determine the number of multimapping reads to allow.
Here is a resource using two methods to generate such "mappabilty" tracks using:
(1) a python script and bowtie mapper. (2) Using GEM package
As zebrafish genomes currently available lack such a resource, possibly useful danRer10 and danRer7 mappability tracks are proposed. Finished tracks are available as a hub in UCSC genome browser :
UCSC genome browser hubs for danRer7/Zv9 Mappability:
http://userweb.molbiol.ox.ac.uk/public/dariag/mappability/hub_qd_zv9.txt
http://userweb.molbiol.ox.ac.uk/public/dariag/mappability/hub_gem_zv9.txt
UCSC genome browser hubs for GRCz10/danRer10 Mappability:
http://userweb.molbiol.ox.ac.uk/public/dariag/mappability/hub_qd_zv10.txt
http://userweb.molbiol.ox.ac.uk/public/dariag/mappability/hub_gem_zv10.txt
This resource was generated as part of the FoxD3 project carried out in the laboratory of T. Sauka-Spengler at the Weatherall Institute of Molecular Medicine at the University of Oxford, in the UK.
Preliminary manuscript for this project is available on bioRxiv: https://www.biorxiv.org/content/biorxiv/early/2017/11/22/213611.full.pdf
For additional information on project please see tsslab/foxd3 github repository.
- Get fasta sequence of genome.
- Make read set covering whole genome using genomeFasta2reads.py script
To use :
genomeFasta2reads.py <genome.fa> <desired_read_length>
This will generates fastq-formatted "reads" with a phred quality score H of desired length with a step of 1.
Example:
@synthetic_read-danRer10:chr1:0:40
GATCTTAAACATTTATTCCCCCTGCAAACATTTTCAATCA
+
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
@synthetic_read-danRer10:chr1:1:41
ATCTTAAACATTTATTCCCCCTGCAAACATTTTCAATCAT
+
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
@synthetic_read-danRer10:chr1:2:42
TCTTAAACATTTATTCCCCCTGCAAACATTTTCAATCATT
+
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
Links to the fastq.gz files:
- Map fastqs back to genome:
Download bowtie v.1.1.0 index for danRer7
Download bowtie v.1.1.0 index for danRer10
Map "synthetic" fastqs of different back to genome using bowtie(v.1.1.0.) using -m parameters. m1
R1=danRer7_40bp_reads.fastq
bowtie -S -p 24 -m 1 danRer7 $R1 --chunkmb 500 > | samtools view -bS - > zv7_on_zv7_m1_40bp_b1.bam
bowtie -S -p 24 -m 2 danRer7 $R1 --chunkmb 500 | samtools view -bS - > zv7_on_zv7_m2_40bp_b1.bam
bowtie -S -p 24 -m 3 danRer7 $R1 --chunkmb 500 | samtools view -bS - > zv7_on_zv7_m3_40bp_b1.bam
If in a hurry, split fastqs into multiple smaller fastqs and map each individually and then merge using bamtools merge.
- Use bedtools genomeCoverageBed to generate bedgraph files. To visualise make bigwigs using bedGraphToBigWig from UCSC genome browser.
Links to the bigwig files:
- Parse bedgraph file to select regions that have a signal corresponding to length of read. Then use bedtools complement to get the complement (all regions that do not have that signal).
awk '$4 == 40' bmerged_zv7_on_zv7_m1_40bp_b1.bg > bmerged_zv7_on_zv7_m1_40bp_b1_eq.bed
sort -k1,1 -k2,2n bmerged_zv7_on_zv7_m1_40bp_b1_eq.bed > bmerged_zv7_on_zv7_m1_40bp_b1_eq.sort.bed
bedtools complement -i bmerged_zv7_on_zv7_m1_40bp_b1_eq.sort.bed -g danRer7.chrom.sizes > com_bmerged_zv7_on_zv7_m1_40bp_b1_eq.sort.bed
awk '$4 == 40' bmerged_zv10_on_zv10_m1_40bp_b1.bg > bmerged_zv10_on_zv10_m1_40bp_b1_eq.bed
sort -k1,1 -k2,2n bmerged_zv10_on_zv10_m1_40bp_b1_eq.bed > bmerged_zv10_on_zv10_m1_40bp_b1_eq.sort.bed
bedtools complement -i bmerged_zv10_on_zv10_m1_40bp_b1_eq.sort.bed -g danRer10.chrom.sizes > com_bmerged_zv10_on_zv10_m1_40bp_b1_eq.sort.bed
Links to the bed.gz files to regions to exclude:
- Create Genome index file
gem-indexer -i danRer7.fa -o danRer7_index
danRer7 .gem file danRer10 .gem file
- Generate .mappability file
gem-mappability -I danRer7_index.gem -l 50 -o danRer7_40 -T 24
Links to .mappability files:
- Generate .wig file and bigWig files
Links to GEM bigwig files:
GC content
GC content bigwig file for danRer7
GC content bigwig file for danRer10