Skip to content
This repository has been archived by the owner on Jun 11, 2024. It is now read-only.

Workflow 2

Matthew Brush edited this page Aug 24, 2018 · 2 revisions

Question Template

Identify candidate [rare disease] “modifier” genes, that could make promising drug targets for treating [rare or related common disease].

Fanconi Anemia (FA) Instance

Identify candidate Fanconi Anemia (FA) “modifier” genes, that could make promising drug targets for treating FA or FA-related common diseases (e.g. bone marrow failure, AML, head/neck cancers).

Background

We are interested in identifying novel disease modifier genes in FA and understanding the mechanism of modification. Disease modifiers are genes whose function (or dysfunction) has the potential to affect disease mechanism or pathways. For example, Aldh2 is a modifier of Fanconi Anemia (via exacerbation of the DNA damage FA patients are unable to repair). Modifier genes (and their variants) may impact phenotypic presentation, clinical course, and therapeutic sensitivity, and may also be potential drug targets for treating or preventing FA. For example, Aldh2 variants that co-occur with core FA gene mutations can lead to early onset of FA phenotypes, or increased severity of FA presentation. Knowledge about FA disease modifiers can be used to inform novel prevention, detection, treatment, or disease management strategies. For example, alcohol avoidance in patients with defects in both Aldh2 and FA genes, or treatment with Aldh2 agonsists.

Links/Resources

  • A slide deck describing the workflow can be found here.
  • The Bid Matrix for this Workflow is here.
  • Tickets related to this workflow are tagged with the 'Workflow 2' label in the issue tracker

Workflow Overview

The proposed Workflow is comprised of five 'Modules'. Figure 1 shows a high-level overview of the Workflow.


Figure 1: High-level overview of the workflow

Conceptually the workflow has two levels:

  • Level 1 is about finding genes with some connection to the disease - via similarity to some feature(s) of known disease genes, or potential impact on known disease mechanisms/features. Approaches are executed across all genes in genome, with goal of identifying manageable list to move to more targeted validation/prioritization approaches in level 2.
  • Level 2 aims to find additional evidence for prioritizing gene ‘hits’ of interest from level 1. This is required to identify and act upon meaningful candidates for further research. Approaches here may require starting with a narrower, meaningful subset of genes (output from level 1). For example, an enrichment analysis requires some subset of genes in which to try and identify meaningful signals/correlations. and a PheWAS analysis requires starting with a specific gene/variant and identifying a correlation with a phenotypic feature. Genes with additional evidence from Level 2 may be true candidates worth further exploration.

Figure 2 shows an expanded view of specific implementations of workflow modules. There are many ways for each module to be implemented using different data types and methods. Some are shown here and documented in this repo, but the workflow is open to other approaches/ideas.


Figure 2: Expanded view of proposed of workflow modules. Shows specific approaches for each module that could be implemented in parallel.

Modules

  • Module 1: Gene Similarity Functions: Module 1 aims to find paths to find genes 'similar' in some way to known disease genes (e.g. based on GO functional similarity, phenotype similarity, co-expression, siRNA PGMs, chemical association data) - with the idea that these have increased likelihood of impacting FA pathways or and modifying the FA disease. There are many dimensions along which genes may show similarity to FA genes, which can be implemented in parallel as separate 'submodules' of Module 1. The resulting gene lists can then be integrated and analyzed to identify promising candidates to promote for further exploration in level 2 of the Workflow.

  • Module 2: Disease Feature Associated Genes: Paths to find genes predicted to impact a known feature or mechanism of the disease. (e.g. for FA, genes that may impact the types of DNA damage contributing to DNA replication fork stalls)

  • Module 3: List Integration:. Generic module for combining/integrating gene lists, that may support things like identifying intersections or relationships among list members.

  • Module 4: General Bioinformatic Techniques: Approaches for identifying patterns/signals in list members, based on their features (e.g. enrichment or clustering analyses based on annotation to GO terms, pathway, protein domain, etc).

  • Module 5: Patient Geno-Pheno Analyses: Applies case-level phenotype and genotype data evaluated/validate potential modifier genes (e.g. PheWAS analysis, genetic interaction analysis).

For a deeper dive into the details, consideration, and proposals for the Modules see the Issues created in this repo for each.

Summary

In this workflow level 1 creates gene list enriched for possible modifiers. Level 2 evaluates list to find most promising - approaches proposed here require starting with a smaller, meaningful subset of genes (output from level 1). There are many ways for each module to be implemented using different data types and methods. There are also various possible ways that these modules could be ordered in the workflow - we will have to settle on the best before our final implementation for the MVP. For example, we could use Module 4 to refine candidates that are input to Module 5 (as opposed to these being parallel paths). Or we could use Module 5 as an alternate implementation of Module 2 to identify candidates to pass on to level 2 for evaluation/prioritization.

At many steps there is an option to perform ortholog expansion and leverage model organism data. Parallel construction in some branches would allow for toggling paths on/off to see how results are affected (e.g. omit phenotype data). Data availability may affect ability to perform Modules 1D (chemical data), 2 (molecular defect data), and 5 (clinical geno-pheno data).

Clone this wiki locally