-
Notifications
You must be signed in to change notification settings - Fork 57
pipelines_somatic_exome.cwl
APipe Tester edited this page Jul 14, 2021
·
31 revisions
This page is auto-generated. Do not edit.
somatic_exome: exome alignment and somatic variant detection
somatic_exome is designed to perform processing of mutant/wildtype H.sapiens exome sequencing data. It features BQSR corrected alignments, 4 caller variant detection, and vep style annotations. Structural variants are detected via manta and cnvkit. In addition QC metrics are run, including somalier concordance metrics.
example input file = analysis_workflows/example_data/somatic_exome.yaml
Name | Label | Description | Type | Secondary Files |
---|---|---|---|---|
reference | reference: Reference fasta file for a desired assembly | reference contains the nucleotide sequence for a given assembly (hg37, hg38, etc.) in fasta format for the entire genome. This is what reads will be aligned to. Appropriate files can be found on ensembl at https://ensembl.org/info/data/ftp/index.html When providing the reference secondary files corresponding to reference indices must be located in the same directory as the reference itself. These files can be created with samtools index, bwa index, and picard CreateSequenceDictionary. | ['string', 'File'] | ['.fai', '^.dict', '.amb', '.ann', '.bwt', '.pac', '.sa'] |
tumor_sequence | tumor_sequence: MT sequencing data and readgroup information | tumor_sequence represents the sequencing data for the MT sample as either FASTQs or BAMs with accompanying readgroup information. Note that in the @RG field ID and SM are required. | ../types/sequence_data.yml#sequence_data[] | |
tumor_name | tumor_name: String specifying the name of the MT sample | tumor_name provides a string for what the MT sample will be referred to in the various outputs, for example the VCF files. | string? | |
normal_sequence | normal_sequence: WT sequencing data and readgroup information | normal_sequence represents the sequencing data for the WT sample as either FASTQs or BAMs with accompanying readgroup information. Note that in the @RG field ID and SM are required. | ../types/sequence_data.yml#sequence_data[] | |
normal_name | normal_name: String specifying the name of the WT sample | normal_name provides a string for what the WT sample will be referred to in the various outputs, for example the VCF files. | string? | |
trimming | ['../types/trimming_options.yml#trimming_options', 'null'] | |||
bqsr_known_sites | bqsr_known_sites: One or more databases of known polymorphic sites used to exclude regions around known polymorphisms from analysis. | Known polymorphic indels recommended by GATK for a variety of tools including the BaseRecalibrator. This is part of the GATK resource bundle available at http://www.broadinstitute.org/gatk/guide/article?id=1213 File should be in vcf format, and tabix indexed. | File[] | ['.tbi'] |
bqsr_intervals | bqsr_intervals: Array of strings specifying regions for base quality score recalibration | bqsr_intervals provides an array of genomic intervals for which to apply GATK base quality score recalibrations. Typically intervals are given for the entire chromosome (chr1, chr2, etc.), these names should match the format in the reference file. | string[] | |
bait_intervals | bait_intervals: interval_list file of baits used in the sequencing experiment | bait_intervals is an interval_list corresponding to the baits used in sequencing reagent. These are essentially coordinates for regions you were able to design probes for in the reagent. Typically the reagent provider has this information available in bed format and it can be converted to an interval_list with Picard BedToIntervalList. Astrazeneca also maintains a repo of baits for common sequencing reagents available at https://github.com/AstraZeneca-NGS/reference_data | File | |
target_intervals | target_intervals: interval_list file of targets used in the sequencing experiment | target_intervals is an interval_list corresponding to the targets for the capture reagent. Bed files with this information can be converted to interval_lists with Picard BedToIntervalList. In general for a WES exome reagent bait_intervals and target_intervals are the same. | File | |
target_interval_padding | target_interval_padding: number of bp flanking each target region in which to allow variant calls | The effective coverage of capture products generally extends out beyond the actual regions targeted. This parameter allows variants to be called in these wingspan regions, extending this many base pairs from each side of the target regions. | int | |
per_base_intervals | per_base_intervals: additional intervals over which to summarize coverage/QC at a per-base resolution | per_base_intervals is a list of regions (in interval_list format) over which to summarize coverage/QC at a per-base resolution. | ../types/labelled_file.yml#labelled_file[] | |
per_target_intervals | per_target_intervals: additional intervals over which to summarize coverage/QC at a per-target resolution | per_target_intervals list of regions (in interval_list format) over which to summarize coverage/QC at a per-target resolution. | ../types/labelled_file.yml#labelled_file[] | |
summary_intervals | ../types/labelled_file.yml#labelled_file[] | |||
omni_vcf | File | ['.tbi'] | ||
picard_metric_accumulation_level | string | |||
qc_minimum_mapping_quality | int? | |||
qc_minimum_base_quality | int? | |||
strelka_cpu_reserved | int? | |||
scatter_count | scatters each supported variant detector (varscan, pindel, mutect) into this many parallel jobs | int | ||
mutect_artifact_detection_mode | boolean | |||
mutect_max_alt_allele_in_normal_fraction | float? | |||
mutect_max_alt_alleles_in_normal_count | int? | |||
varscan_strand_filter | int? | |||
varscan_min_coverage | int? | |||
varscan_min_var_freq | float? | |||
varscan_p_value | float? | |||
varscan_max_normal_freq | float? | |||
pindel_insert_size | int | |||
docm_vcf | The set of alleles that gatk haplotype caller will use to force-call regardless of evidence | File | ['.tbi'] | |
filter_docm_variants | boolean? | |||
filter_somatic_llr_threshold | Sets the stringency (log-likelihood ratio) used to filter out non-somatic variants. Typical values are 10=high stringency, 5=normal, 3=low stringency. Low stringency may be desirable when read depths are low (as in WGS) or when tumor samples are impure. | float | ||
filter_somatic_llr_tumor_purity | Sets the purity of the tumor used in the somatic llr filter, used to remove non-somatic variants. Probably only needs to be adjusted for low-purity (< 50%). Range is 0 to 1 | float | ||
filter_somatic_llr_normal_contamination_rate | Sets the fraction of tumor present in the normal sample (range 0 to 1), used in the somatic llr filter. Useful for heavily contaminated adjacent normals. Range is 0 to 1 | float | ||
vep_cache_dir | path to the vep cache directory, available at: https://useast.ensembl.org/info/docs/tools/vep/script/vep_cache.html#pre | ['string', 'Directory'] | ||
vep_ensembl_assembly | genome assembly to use in vep. Examples: GRCh38 or GRCm38 | string | ||
vep_ensembl_version | ensembl version - Must be present in the cache directory. Example: 95 | string | ||
vep_ensembl_species | ensembl species - Must be present in the cache directory. Examples: homo_sapiens or mus_musculus | string | ||
synonyms_file | synonyms_file allows the use of different chromosome identifiers in vep inputs or annotation files (cache, database, GFF, custom file, fasta). File should be tab-delimited with the primary identifier in column 1 and the synonym in column 2. | File? | ||
annotate_coding_only | if set to true, vep only returns consequences that fall in the coding regions of transcripts | boolean? | ||
vep_pick | configures how vep will annotate genomic features that each variant overlaps; for a detailed description of each option see https://useast.ensembl.org/info/docs/tools/vep/script/vep_other.html#pick_allele_gene_eg | ['null', {'type': 'enum', 'symbols': ['pick', 'flag_pick', 'pick_allele', 'per_gene', 'pick_allele_gene', 'flag_pick_allele', 'flag_pick_allele_gene']}] | ||
cle_vcf_filter | boolean | |||
variants_to_table_fields | The names of one or more standard VCF fields or INFO fields to include in the output table | string[] | ||
variants_to_table_genotype_fields | The name of a genotype field to include in the output table | string[] | ||
vep_to_table_fields | VEP fields in final output | string[] | ||
vep_custom_annotations | custom type, check types directory for input format | ../types/vep_custom_annotation.yml#vep_custom_annotation[] | ||
manta_call_regions | bgzip-compressed, tabix-indexed BED file specifiying regions to which manta structural variant analysis is limited | File? | ['.tbi'] | |
manta_non_wgs | toggles on or off manta settings for WES vs. WGS mode for structural variant detection | boolean? | ||
manta_output_contigs | if set to true configures manta to output assembled contig sequences in the final VCF files | boolean? | ||
somalier_vcf | a vcf file of known polymorphic sites for somalier to compare normal and tumor samples for identity; sites files can be found at: https://github.com/brentp/somalier/releases | File | ||
tumor_sample_name | string | |||
normal_sample_name | string | |||
validated_variants | An optional VCF with variants that will be flagged as 'VALIDATED' if found in this pipeline's main output VCF | File? | ['.tbi'] | |
cnvkit_target_average_size | approximate size of split target bins for CNVkit; if not set a suitable window size will be set by CNVkit automatically | int? |
Name | Label | Description | Type | Secondary Files |
---|---|---|---|---|
tumor_cram | File | |||
tumor_mark_duplicates_metrics | File | |||
tumor_insert_size_metrics | File | |||
tumor_alignment_summary_metrics | File | |||
tumor_hs_metrics | File | |||
tumor_per_target_coverage_metrics | File[] | |||
tumor_per_target_hs_metrics | File[] | |||
tumor_per_base_coverage_metrics | File[] | |||
tumor_per_base_hs_metrics | File[] | |||
tumor_summary_hs_metrics | File[] | |||
tumor_flagstats | File | |||
tumor_verify_bam_id_metrics | File | |||
tumor_verify_bam_id_depth | File | |||
normal_cram | File | |||
normal_mark_duplicates_metrics | File | |||
normal_insert_size_metrics | File | |||
normal_alignment_summary_metrics | File | |||
normal_hs_metrics | File | |||
normal_per_target_coverage_metrics | File[] | |||
normal_per_target_hs_metrics | File[] | |||
normal_per_base_coverage_metrics | File[] | |||
normal_per_base_hs_metrics | File[] | |||
normal_summary_hs_metrics | File[] | |||
normal_flagstats | File | |||
normal_verify_bam_id_metrics | File | |||
normal_verify_bam_id_depth | File | |||
mutect_unfiltered_vcf | File | ['.tbi'] | ||
mutect_filtered_vcf | File | ['.tbi'] | ||
strelka_unfiltered_vcf | File | ['.tbi'] | ||
strelka_filtered_vcf | File | ['.tbi'] | ||
varscan_unfiltered_vcf | File | ['.tbi'] | ||
varscan_filtered_vcf | File | ['.tbi'] | ||
pindel_unfiltered_vcf | File | ['.tbi'] | ||
pindel_filtered_vcf | File | ['.tbi'] | ||
docm_filtered_vcf | File | ['.tbi'] | ||
final_vcf | File | ['.tbi'] | ||
final_filtered_vcf | File | ['.tbi'] | ||
final_tsv | File | |||
vep_summary | File | |||
tumor_snv_bam_readcount_tsv | File | |||
tumor_indel_bam_readcount_tsv | File | |||
normal_snv_bam_readcount_tsv | File | |||
normal_indel_bam_readcount_tsv | File | |||
intervals_antitarget | File? | |||
intervals_target | File? | |||
normal_antitarget_coverage | File | |||
normal_target_coverage | File | |||
reference_coverage | File? | |||
cn_diagram | File? | |||
cn_scatter_plot | File? | |||
tumor_antitarget_coverage | File | |||
tumor_target_coverage | File | |||
tumor_bin_level_ratios | File | |||
tumor_segmented_ratios | File | |||
diploid_variants | File? | ['.tbi'] | ||
somatic_variants | File? | ['.tbi'] | ||
all_candidates | File | ['.tbi'] | ||
small_candidates | File | ['.tbi'] | ||
tumor_only_variants | File? | ['.tbi'] | ||
somalier_concordance_metrics | File | |||
somalier_concordance_statistics | File |
Name | CWL Run |
---|---|
tumor_alignment_and_qc | pipelines/alignment_exome.cwl |
normal_alignment_and_qc | pipelines/alignment_exome.cwl |
concordance | tools/concordance.cwl |
pad_target_intervals | tools/interval_list_expand.cwl |
detect_variants | pipelines/detect_variants.cwl |
cnvkit | tools/cnvkit_batch.cwl |
manta | tools/manta_somatic.cwl |
tumor_bam_to_cram | tools/bam_to_cram.cwl |
tumor_index_cram | tools/index_cram.cwl |
normal_bam_to_cram | tools/bam_to_cram.cwl |
normal_index_cram | tools/index_cram.cwl |