Skip to content

kspham/mdup

Folders and files

NameName
Last commit message
Last commit date

Latest commit

199ba68 · Jun 10, 2018

History

23 Commits
May 24, 2018
May 27, 2018
May 25, 2018
May 25, 2018
Jun 10, 2018
Jun 2, 2018
Jun 2, 2018
Jun 2, 2018
May 30, 2018
May 24, 2018
May 30, 2018
May 30, 2018
May 24, 2018
May 24, 2018
May 24, 2018
Jun 2, 2018
Jun 2, 2018
May 26, 2018
May 27, 2018
May 27, 2018
May 26, 2018
May 26, 2018
Jun 2, 2018
Jun 2, 2018

Repository files navigation

Overview

Developed by BioTuring (www.bioturing.com), mdup is a tool that preprocess cloud-read data (read has barcode). mdup will do:

  • Remove duplicate reads, remove not primary reads, secondary alignment, unmapped reads.
  • Detect molecule by clustering reads have same barcode into group.
  • Get stats about sequencing and GEM performance.

Two reads are consider duplicate if they share same mapped position, mapped target, cigar, mate info (if paired-end).

Install

git clone https://github.com/kspham/mdup.git
cd mdup
bash build.sh

Usage

mdup take a BAM file as input, the Bam file must be sorted by coordinate and be indexed. Recommend using BWA to align cloud-read to referenece. All alignment record must have BX:Z: tag present for barcode.

mdup will generate some file in output directory:

  • output.bam : new BAM file after remove unneeded reads.
  • molecule.tsv : all molecule detected info.
  • summary.inf : stats about sequencing and GEM performance.
  • plot.html : plot of some metrics of stats.
./mdup [option] in.bam

Optional arguments:
  -t INT                number of threads [default: 1]
  -o DIR                output directory [default: "./mdup_out/"]
  -g FILE               reference file that generated bam file (for better stats)
  -n INT                minimum number of reads require for a molecule (default: 4)
  -l INT                minimum length require for a molecule (default: 1000)
  -k                    don't mark duplicate.

Contacts

Please report any issues directly to the github issue tracker.