-
Notifications
You must be signed in to change notification settings - Fork 58
pipelines_somatic_exome.cwl
Travis CI User edited this page Aug 8, 2020
·
31 revisions
This page is auto-generated. Do not edit.
somatic_exome: exome alignment and somatic variant detection
somatic_exome is designed to perform processing of mutant/wildtype H.sapiens exome sequencing data. It features BQSR corrected alignments, 4 caller variant detection, and vep style annotations. Structural variants are detected via manta and cnvkit. In addition QC metrics are run, including somalier concordance metrics.
example input file = analysis_workflows/example_data/somatic_exome.yaml
Name | Label | Description | Type | Secondary Files |
---|---|---|---|---|
reference | reference: Reference fasta file for a desired assembly | reference contains the nucleotide sequence for a given assembly (hg37, hg38, etc.) in fasta format for the entire genome. This is what reads will be aligned to. Appropriate files can be found on ensembl at https://ensembl.org/info/data/ftp/index.html When providing the reference secondary files corresponding to reference indices must be located in the same directory as the reference itself. These files can be created with samtools index, bwa index, and picard CreateSequenceDictionary. | ['string', 'File'] | ['.fai', '^.dict', '.amb', '.ann', '.bwt', '.pac', '.sa'] |
tumor_sequence | tumor_sequence: yml file specifying the location of MT sequencing data | tumor_sequence is a yml file for which to pass information regarding sequencing data for single sample (i.e. fastq files). If more than one fastq file exist for a sample, as in the case for multiple instrument data, the sequence tag is simply repeated with the additional data (see example input file). Note that in the @RG field ID and SM are required. | ../types/sequence_data.yml#sequence_data[] | |
tumor_name | tumor_name: String specifying the name of the MT sample | tumor_name provides a string for what the MT sample will be referred to in the various outputs, for exmaple the VCF files. | string? | |
normal_sequence | normal_sequence: yml file specifying the location of WT sequencing data | normal_sequence is a yml file for which to pass information regarding sequencing data for single sample (i.e. fastq files). If more than one fastq file exist for a sample, as in the case for multiple instrument data, the sequence tag is simply repeated with the additional data (see example input file). Note that in the @RG field ID and SM are required. | ../types/sequence_data.yml#sequence_data[] | |
normal_name | normal_name: String specifying the name of the WT sample | normal_name provides a string for what the WT sample will be referred to in the various outputs, for exmaple the VCF files. | string? | |
trimming | ['../types/trimming_options.yml#trimming_options', 'null'] | |||
mills | mills: File specifying common polymorphic indels from mills et al. | mills provides known polymorphic indels recommended by GATK for a variety of tools including the BaseRecalibrator. This file is part of the GATK resource bundle available at http://www.broadinstitute.org/gatk/guide/article?id=1213 Essentially it is a list of known indels originally discovered by mill et al. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1557762/ File should be in vcf format, and tabix indexed. | File | ['.tbi'] |
known_indels | known_indels: File specifying common polymorphic indels from 1000G | known_indels provides known indels reecommended by GATK for a variety of tools including the BaseRecalibrator. This file is part of the GATK resource bundle available at http://www.broadinstitute.org/gatk/guide/article?id=1213 Essintially it is a list of known indels from 1000 Genomes Phase I indel calls. File should be in vcf format, and tabix indexed. | File | ['.tbi'] |
dbsnp_vcf | dbsnp_vcf: File specifying common polymorphic indels from dbSNP | dbsnp_vcf provides known indels reecommended by GATK for a variety of tools including the BaseRecalibrator. This file is part of the GATK resource bundle available at http://www.broadinstitute.org/gatk/guide/article?id=1213 Essintially it is a list of known indels from dbSNP. File should be in vcf format, and tabix indexed. | File | ['.tbi'] |
bqsr_intervals | bqsr_intervals: Array of strings specifying regions for base quality score recalibration | bqsr_intervals provides an array of genomic intervals for which to apply GATK base quality score recalibrations. Typically intervals are given for the entire chromosome (i.e. chr1, chr2, etc.), these names should match the format in the reference file. | string[] | |
bait_intervals | bait_intervals: interval_list file of baits used in the sequencing experiment | bait_intervals is an interval_list corresponding to the baits used in sequencing reagent. These are essentially coordinates for regions you were able to design probes for in the reagent. Typically the reagent provider has this information available in bed format and it can be converted to an interval_list with Picard BedToIntervalList. Astrazeneca also maintains a repo of baits for common sequencing reagents available at https://github.com/AstraZeneca-NGS/reference_data | File | |
target_intervals | target_intervals: interval_list file of targets used in the sequencing experiment | target_intervals is an interval_list corresponding to the targets for the capture reagent. Bed files with this information can be converted to interval_lists with Picard BedToIntervalList. In general for a WES exome reagent bait_intervals and target_intervals are the same. | File | |
target_interval_padding | target_interval_padding: number of bp flanking each target region in which to allow variant calls | The effective coverage of capture products generally extends out beyond the actual regions targeted. This parameter allows variants to be called in these wingspan regions, extending this many base pairs from each side of the target regions. | int | |
per_base_intervals | per_base_intervals: additional intervals over which to summarize coverage/QC at a per-base resolution | per_base_intervals is a list of regions (in interval_list format) over which to summarize coverage/QC at a per-base resolution. | ../types/labelled_file.yml#labelled_file[] | |
per_target_intervals | per_target_intervals: additional intervals over which to summarize coverage/QC at a per-target resolution | per_target_intervals list of regions (in interval_list format) over which to summarize coverage/QC at a per-target resolution. | ../types/labelled_file.yml#labelled_file[] | |
summary_intervals | ../types/labelled_file.yml#labelled_file[] | |||
omni_vcf | File | ['.tbi'] | ||
picard_metric_accumulation_level | string | |||
qc_minimum_mapping_quality | int? | |||
qc_minimum_base_quality | int? | |||
cosmic_vcf | File? | ['.tbi'] | ||
panel_of_normals_vcf | File? | ['.tbi'] | ||
strelka_cpu_reserved | int? | |||
mutect_scatter_count | int | |||
mutect_artifact_detection_mode | boolean | |||
mutect_max_alt_allele_in_normal_fraction | float? | |||
mutect_max_alt_alleles_in_normal_count | int? | |||
varscan_strand_filter | int? | |||
varscan_min_coverage | int? | |||
varscan_min_var_freq | float? | |||
varscan_p_value | float? | |||
varscan_max_normal_freq | float? | |||
pindel_insert_size | int | |||
docm_vcf | The set of alleles that gatk haplotype caller will use to force-call regardless of evidence | File | ['.tbi'] | |
filter_docm_variants | boolean? | |||
vep_cache_dir | path to the vep cache directory, available at: https://useast.ensembl.org/info/docs/tools/vep/script/vep_cache.html#pre | ['string', 'Directory'] | ||
vep_ensembl_assembly | genome assembly to use in vep. Examples: GRCh38 or GRCm38 | string | ||
vep_ensembl_version | ensembl version - Must be present in the cache directory. Example: 95 | string | ||
vep_ensembl_species | ensembl species - Must be present in the cache directory. Examples: homo_sapiens or mus_musculus | string | ||
synonyms_file | synonyms_file allows the use of different chromosome identifiers in vep inputs or annotation files (cache, database, GFF, custom file, fasta). File should be tab-delimited with the primary identifier in column 1 and the synonym in column 2. | File? | ||
annotate_coding_only | if set to true, vep only returns consequences that fall in the coding regions of transcripts | boolean? | ||
vep_pick | configures how vep will annotate genomic features that each variant overlaps; for a detailed description of each option see https://useast.ensembl.org/info/docs/tools/vep/script/vep_other.html#pick_allele_gene_eg | ['null', {'type': 'enum', 'symbols': ['pick', 'flag_pick', 'pick_allele', 'per_gene', 'pick_allele_gene', 'flag_pick_allele', 'flag_pick_allele_gene']}] | ||
cle_vcf_filter | boolean | |||
variants_to_table_fields | The names of one or more standard VCF fields or INFO fields to include in the output table | string[] | ||
variants_to_table_genotype_fields | The name of a genotype field to include in the output table | string[] | ||
vep_to_table_fields | VEP fields in final output | string[] | ||
vep_custom_annotations | custom type, check types directory for input format | ../types/vep_custom_annotation.yml#vep_custom_annotation[] | ||
manta_call_regions | bgzip-compressed, tabix-indexed BED file specifiying regions to which manta structural variant analysis is limited | File? | ['.tbi'] | |
manta_non_wgs | toggles on or off manta settings for WES vs. WGS mode for structural variant detection | boolean? | ||
manta_output_contigs | if set to true configures manta to output assembled contig sequences in the final VCF files | boolean? | ||
somalier_vcf | a vcf file of known polymorphic sites for somalier to compare normal and tumor samples for identity; sites files can be found at: https://github.com/brentp/somalier/releases | File | ||
tumor_sample_name | string | |||
normal_sample_name | string | |||
known_variants | Previously discovered variants to be flagged in this pipelines's output vcf | File? | ['.tbi'] |
Name | Label | Description | Type | Secondary Files |
---|---|---|---|---|
tumor_cram | File | |||
tumor_mark_duplicates_metrics | File | |||
tumor_insert_size_metrics | File | |||
tumor_alignment_summary_metrics | File | |||
tumor_hs_metrics | File | |||
tumor_per_target_coverage_metrics | File[] | |||
tumor_per_target_hs_metrics | File[] | |||
tumor_per_base_coverage_metrics | File[] | |||
tumor_per_base_hs_metrics | File[] | |||
tumor_summary_hs_metrics | File[] | |||
tumor_flagstats | File | |||
tumor_verify_bam_id_metrics | File | |||
tumor_verify_bam_id_depth | File | |||
normal_cram | File | |||
normal_mark_duplicates_metrics | File | |||
normal_insert_size_metrics | File | |||
normal_alignment_summary_metrics | File | |||
normal_hs_metrics | File | |||
normal_per_target_coverage_metrics | File[] | |||
normal_per_target_hs_metrics | File[] | |||
normal_per_base_coverage_metrics | File[] | |||
normal_per_base_hs_metrics | File[] | |||
normal_summary_hs_metrics | File[] | |||
normal_flagstats | File | |||
normal_verify_bam_id_metrics | File | |||
normal_verify_bam_id_depth | File | |||
mutect_unfiltered_vcf | File | ['.tbi'] | ||
mutect_filtered_vcf | File | ['.tbi'] | ||
strelka_unfiltered_vcf | File | ['.tbi'] | ||
strelka_filtered_vcf | File | ['.tbi'] | ||
varscan_unfiltered_vcf | File | ['.tbi'] | ||
varscan_filtered_vcf | File | ['.tbi'] | ||
pindel_unfiltered_vcf | File | ['.tbi'] | ||
pindel_filtered_vcf | File | ['.tbi'] | ||
docm_filtered_vcf | File | ['.tbi'] | ||
final_vcf | File | ['.tbi'] | ||
final_filtered_vcf | File | ['.tbi'] | ||
final_tsv | File | |||
vep_summary | File | |||
tumor_snv_bam_readcount_tsv | File | |||
tumor_indel_bam_readcount_tsv | File | |||
normal_snv_bam_readcount_tsv | File | |||
normal_indel_bam_readcount_tsv | File | |||
intervals_antitarget | File? | |||
intervals_target | File? | |||
normal_antitarget_coverage | File | |||
normal_target_coverage | File | |||
reference_coverage | File? | |||
cn_diagram | File? | |||
cn_scatter_plot | File? | |||
tumor_antitarget_coverage | File | |||
tumor_target_coverage | File | |||
tumor_bin_level_ratios | File | |||
tumor_segmented_ratios | File | |||
diploid_variants | File? | ['.tbi'] | ||
somatic_variants | File? | ['.tbi'] | ||
all_candidates | File | ['.tbi'] | ||
small_candidates | File | ['.tbi'] | ||
tumor_only_variants | File? | ['.tbi'] | ||
somalier_concordance_metrics | File | |||
somalier_concordance_statistics | File |
Name | CWL Run |
---|---|
tumor_alignment_and_qc | pipelines/alignment_exome.cwl |
normal_alignment_and_qc | pipelines/alignment_exome.cwl |
concordance | tools/concordance.cwl |
pad_target_intervals | tools/interval_list_expand.cwl |
detect_variants | pipelines/detect_variants.cwl |
cnvkit | tools/cnvkit_batch.cwl |
manta | tools/manta_somatic.cwl |
tumor_bam_to_cram | tools/bam_to_cram.cwl |
tumor_index_cram | tools/index_cram.cwl |
normal_bam_to_cram | tools/bam_to_cram.cwl |
normal_index_cram | tools/index_cram.cwl |