layout | title | parent |
---|---|---|
default |
SAMtools |
2. Program guides |
SAMtools is a set of utilities that can manipulate alignment formats. It imports from and exports to the SAM, BAM & CRAM; does sorting, merging & indexing; and allows reads in any region to be retrieved swiftly.
Sequence Alignment Map (SAM/.sam
) is a text-based file is a text-based file format for sequence alignments.
It's binary equivalent is Binary Alignment Map (BAM/.bam
), which stores the same data as a compressed binary file.
A binary file for a sequence alignment is preferable over a text file, as binary files are faster to work with.
A SAM alignment file (example_alignment.sam
) can be converted to a BAM alignment using samtools view
.
$ samtools view -@ n -Sb -o example_alignment.bam example_alignment.sam
In this command...
-@
sets the number (n
) of threads/CPUs to be used. This flag is optional and can be used with othersamtools
commands.-Sb
specifies that the input is in SAM format (S
) and the output will be be BAM format(b
).-o
sets the name of the output file (example_alignment.bam
).example_alignment.sam
is the name of the input file.
Now that the example alignment is in BAM format, we can sort it using samtools sort
.
Sorting this alignment will allow us to create a index.
$ samtools sort -O bam -o sorted_example_alignment.bam example_alignment.bam
In this command...
-O
specifies the output format (bam
,sam
, orcram
).-o
sets the name of the output file (sorted_example_alignment.bam
).example_alignment.bam
is the name of the input file.
This sorted BAM alignment file can now be indexed using samtools index
.
Indexing speeds allows fast random access to this alignment, allowing the information in the alignment file to be processed faster.
$ samtools index sorted_example_alignment.bam
In this command...
sorted_example_alignment.bam
is the name of the input file.
In this video, samtools
is used to convert example_alignment.sam
into a BAM file, sort that BAM file, and index it.
wgsim
is a SAMtools program that can simulate short sequencing reads from a reference genome.
This is useful for creating FASTQ files to practice with.
$ wgsim example_nucleotide_sequence.fasta example_reads_1.fastq example_reads_2.fastq
In this command...
example_nucleotide_sequence.fasta
is the reference genome input.example_reads_1.fastq
andexample_reads_2.fastq
are the names of the simulated read output files.
In this video, wgsim
is used to simulate reads from example_nucleotide_sequence.fasta
.
SAMtools can be used to index a FASTA file as follows...
$ samtools faidx file.fasta
After running this command, file.fasta
can now be used by bcftools.
- Alignment formats
- The
samtools
manual: https://www.htslib.org/doc/samtools.html