pipelines_somatic_exome.cwl

Documentation for somatic_exome.cwl

This page is auto-generated. Do not edit.

Overview

somatic_exome: exome alignment and somatic variant detection

Introduction

somatic_exome is designed to perform processing of mutant/wildtype H.sapiens exome sequencing data. It features BQSR corrected alignments, 4 caller variant detection, and vep style annotations. Structural variants are detected via manta and cnvkit. In addition QC metrics are run, including somalier concordance metrics.

example input file = analysis_workflows/example_data/somatic_exome.yaml

Inputs

Name	Label	Description	Type	Secondary Files
reference	reference: Reference fasta file for a desired assembly	reference contains the nucleotide sequence for a given assembly (hg37, hg38, etc.) in fasta format for the entire genome. This is what reads will be aligned to. Appropriate files can be found on ensembl at https://ensembl.org/info/data/ftp/index.html When providing the reference secondary files corresponding to reference indices must be located in the same directory as the reference itself. These files can be created with samtools index, bwa index, and picard CreateSequenceDictionary.	['string', 'File']	['.fai', '^.dict', '.amb', '.ann', '.bwt', '.pac', '.sa']
tumor_sequence	tumor_sequence: MT sequencing data and readgroup information	tumor_sequence represents the sequencing data for the MT sample as either FASTQs or BAMs with accompanying readgroup information. Note that in the @RG field ID and SM are required.	../types/sequence_data.yml#sequence_data[]
tumor_name	tumor_name: String specifying the name of the MT sample	tumor_name provides a string for what the MT sample will be referred to in the various outputs, for example the VCF files.	string?
normal_sequence	normal_sequence: WT sequencing data and readgroup information	normal_sequence represents the sequencing data for the WT sample as either FASTQs or BAMs with accompanying readgroup information. Note that in the @RG field ID and SM are required.	../types/sequence_data.yml#sequence_data[]
normal_name	normal_name: String specifying the name of the WT sample	normal_name provides a string for what the WT sample will be referred to in the various outputs, for example the VCF files.	string?
trimming			['../types/trimming_options.yml#trimming_options', 'null']
bqsr_known_sites	bqsr_known_sites: One or more databases of known polymorphic sites used to exclude regions around known polymorphisms from analysis.	Known polymorphic indels recommended by GATK for a variety of tools including the BaseRecalibrator. This is part of the GATK resource bundle available at http://www.broadinstitute.org/gatk/guide/article?id=1213 File should be in vcf format, and tabix indexed.	File[]	['.tbi']
bqsr_intervals	bqsr_intervals: Array of strings specifying regions for base quality score recalibration	bqsr_intervals provides an array of genomic intervals for which to apply GATK base quality score recalibrations. Typically intervals are given for the entire chromosome (chr1, chr2, etc.), these names should match the format in the reference file.	string[]
bait_intervals	bait_intervals: interval_list file of baits used in the sequencing experiment	bait_intervals is an interval_list corresponding to the baits used in sequencing reagent. These are essentially coordinates for regions you were able to design probes for in the reagent. Typically the reagent provider has this information available in bed format and it can be converted to an interval_list with Picard BedToIntervalList. Astrazeneca also maintains a repo of baits for common sequencing reagents available at https://github.com/AstraZeneca-NGS/reference_data	File
target_intervals	target_intervals: interval_list file of targets used in the sequencing experiment	target_intervals is an interval_list corresponding to the targets for the capture reagent. Bed files with this information can be converted to interval_lists with Picard BedToIntervalList. In general for a WES exome reagent bait_intervals and target_intervals are the same.	File
target_interval_padding	target_interval_padding: number of bp flanking each target region in which to allow variant calls	The effective coverage of capture products generally extends out beyond the actual regions targeted. This parameter allows variants to be called in these wingspan regions, extending this many base pairs from each side of the target regions.	int
per_base_intervals	per_base_intervals: additional intervals over which to summarize coverage/QC at a per-base resolution	per_base_intervals is a list of regions (in interval_list format) over which to summarize coverage/QC at a per-base resolution.	../types/labelled_file.yml#labelled_file[]
per_target_intervals	per_target_intervals: additional intervals over which to summarize coverage/QC at a per-target resolution	per_target_intervals list of regions (in interval_list format) over which to summarize coverage/QC at a per-target resolution.	../types/labelled_file.yml#labelled_file[]
summary_intervals			../types/labelled_file.yml#labelled_file[]
omni_vcf			File	['.tbi']
picard_metric_accumulation_level			string
qc_minimum_mapping_quality			int?
qc_minimum_base_quality			int?
strelka_cpu_reserved			int?
scatter_count		scatters each supported variant detector (varscan, pindel, mutect) into this many parallel jobs	int
mutect_artifact_detection_mode			boolean
mutect_max_alt_allele_in_normal_fraction			float?
mutect_max_alt_alleles_in_normal_count			int?
varscan_strand_filter			int?
varscan_min_coverage			int?
varscan_min_var_freq			float?
varscan_p_value			float?
varscan_max_normal_freq			float?
pindel_insert_size			int
docm_vcf		The set of alleles that gatk haplotype caller will use to force-call regardless of evidence	File	['.tbi']
filter_docm_variants			boolean?
filter_somatic_llr_threshold		Sets the stringency (log-likelihood ratio) used to filter out non-somatic variants. Typical values are 10=high stringency, 5=normal, 3=low stringency. Low stringency may be desirable when read depths are low (as in WGS) or when tumor samples are impure.	float
filter_somatic_llr_tumor_purity		Sets the purity of the tumor used in the somatic llr filter, used to remove non-somatic variants. Probably only needs to be adjusted for low-purity (< 50%). Range is 0 to 1	float
filter_somatic_llr_normal_contamination_rate		Sets the fraction of tumor present in the normal sample (range 0 to 1), used in the somatic llr filter. Useful for heavily contaminated adjacent normals. Range is 0 to 1	float
vep_cache_dir		path to the vep cache directory, available at: https://useast.ensembl.org/info/docs/tools/vep/script/vep_cache.html#pre	['string', 'Directory']
vep_ensembl_assembly		genome assembly to use in vep. Examples: GRCh38 or GRCm38	string
vep_ensembl_version		ensembl version - Must be present in the cache directory. Example: 95	string
vep_ensembl_species		ensembl species - Must be present in the cache directory. Examples: homo_sapiens or mus_musculus	string
synonyms_file		synonyms_file allows the use of different chromosome identifiers in vep inputs or annotation files (cache, database, GFF, custom file, fasta). File should be tab-delimited with the primary identifier in column 1 and the synonym in column 2.	File?
annotate_coding_only		if set to true, vep only returns consequences that fall in the coding regions of transcripts	boolean?
vep_pick		configures how vep will annotate genomic features that each variant overlaps; for a detailed description of each option see https://useast.ensembl.org/info/docs/tools/vep/script/vep_other.html#pick_allele_gene_eg	['null', {'type': 'enum', 'symbols': ['pick', 'flag_pick', 'pick_allele', 'per_gene', 'pick_allele_gene', 'flag_pick_allele', 'flag_pick_allele_gene']}]
cle_vcf_filter			boolean
variants_to_table_fields		The names of one or more standard VCF fields or INFO fields to include in the output table	string[]
variants_to_table_genotype_fields		The name of a genotype field to include in the output table	string[]
vep_to_table_fields		VEP fields in final output	string[]
vep_custom_annotations		custom type, check types directory for input format	../types/vep_custom_annotation.yml#vep_custom_annotation[]
manta_call_regions		bgzip-compressed, tabix-indexed BED file specifiying regions to which manta structural variant analysis is limited	File?	['.tbi']
manta_non_wgs		toggles on or off manta settings for WES vs. WGS mode for structural variant detection	boolean?
manta_output_contigs		if set to true configures manta to output assembled contig sequences in the final VCF files	boolean?
somalier_vcf		a vcf file of known polymorphic sites for somalier to compare normal and tumor samples for identity; sites files can be found at: https://github.com/brentp/somalier/releases	File
tumor_sample_name			string
normal_sample_name			string
validated_variants		An optional VCF with variants that will be flagged as 'VALIDATED' if found in this pipeline's main output VCF	File?	['.tbi']
cnvkit_target_average_size		approximate size of split target bins for CNVkit; if not set a suitable window size will be set by CNVkit automatically	int?

Outputs

Name	Type	Secondary Files
tumor_cram	File
tumor_mark_duplicates_metrics	File
tumor_insert_size_metrics	File
tumor_alignment_summary_metrics	File
tumor_hs_metrics	File
tumor_per_target_coverage_metrics	File[]
tumor_per_target_hs_metrics	File[]
tumor_per_base_coverage_metrics	File[]
tumor_per_base_hs_metrics	File[]
tumor_summary_hs_metrics	File[]
tumor_flagstats	File
tumor_verify_bam_id_metrics	File
tumor_verify_bam_id_depth	File
normal_cram	File
normal_mark_duplicates_metrics	File
normal_insert_size_metrics	File
normal_alignment_summary_metrics	File
normal_hs_metrics	File
normal_per_target_coverage_metrics	File[]
normal_per_target_hs_metrics	File[]
normal_per_base_coverage_metrics	File[]
normal_per_base_hs_metrics	File[]
normal_summary_hs_metrics	File[]
normal_flagstats	File
normal_verify_bam_id_metrics	File
normal_verify_bam_id_depth	File
mutect_unfiltered_vcf	File	['.tbi']
mutect_filtered_vcf	File	['.tbi']
strelka_unfiltered_vcf	File	['.tbi']
strelka_filtered_vcf	File	['.tbi']
varscan_unfiltered_vcf	File	['.tbi']
varscan_filtered_vcf	File	['.tbi']
pindel_unfiltered_vcf	File	['.tbi']
pindel_filtered_vcf	File	['.tbi']
docm_filtered_vcf	File	['.tbi']
final_vcf	File	['.tbi']
final_filtered_vcf	File	['.tbi']
final_tsv	File
vep_summary	File
tumor_snv_bam_readcount_tsv	File
tumor_indel_bam_readcount_tsv	File
normal_snv_bam_readcount_tsv	File
normal_indel_bam_readcount_tsv	File
intervals_antitarget	File?
intervals_target	File?
normal_antitarget_coverage	File
normal_target_coverage	File
reference_coverage	File?
cn_diagram	File?
cn_scatter_plot	File?
tumor_antitarget_coverage	File
tumor_target_coverage	File
tumor_bin_level_ratios	File
tumor_segmented_ratios	File
diploid_variants	File?	['.tbi']
somatic_variants	File?	['.tbi']
all_candidates	File	['.tbi']
small_candidates	File	['.tbi']
tumor_only_variants	File?	['.tbi']
somalier_concordance_metrics	File
somalier_concordance_statistics	File

Steps

Name	CWL Run
tumor_alignment_and_qc	pipelines/alignment_exome.cwl
normal_alignment_and_qc	pipelines/alignment_exome.cwl
concordance	tools/concordance.cwl
pad_target_intervals	tools/interval_list_expand.cwl
detect_variants	pipelines/detect_variants.cwl
cnvkit	tools/cnvkit_batch.cwl
manta	tools/manta_somatic.cwl
tumor_bam_to_cram	tools/bam_to_cram.cwl
tumor_index_cram	tools/index_cram.cwl
normal_bam_to_cram	tools/bam_to_cram.cwl
normal_index_cram	tools/index_cram.cwl

Want to contribute to this Wiki?

Fork it and send a pull request.

Return to Wiki Home
Return to analysis-workflows

Provide feedback

Saved searches

Use saved searches to filter your results more quickly