Skip to content

Data Ingest Strategy

Peter Robinson edited this page Feb 12, 2020 · 3 revisions

Welcome to the IDGKG-gen wiki!

Let's record our strategies for ingesting various data sources here. We will want to use this for whatever website or supplemental material we wind up making for this data. There are a lot of choices to be made about what data to ingest. I am trying for now to rely on the authors' assessments. For instance, if they have a supplemental file with 900 primary hits but then do downstream work and identify 200 confirmed hits, we should take the confirmed hits. If possible, we should also get negative examples. We should not take grey-zone hits (i.e., positive in primary but negative in secondary), but only gene pairs that were negative in all experiments for this.

  1. Vizeacoumar FJ, et al.. (2013) A negative genetic interaction map in isogenic cancer cell lines reveals cancer cell vulnerabilities. Mol Syst Biol. 2013 Oct 8;9:696. PubMed PMID:24104479.

Six isogenic cell lines were screened these in parallel using a standardized genome-scale pooled shRNA screening pipeline. The HCT116 genetic background was chosen because it is near diploid with intact DNA damage and spindle checkpoints The ‘query’ genotypes chosen were PTTG1􏰁/􏰁, BLM􏰁/􏰁, MUS81􏰁/􏰁, PTEN􏰁/􏰁 and KRASþ/􏰁. We screened the parental cell line and each of the five query or ‘mutant’ cell lines in biological triplicate using a pool of 78 432 unique shRNAs targeting 16056 human genes. The abundance trend of each hairpin at different timepoints was used to compute a dynamic hairpin-level score termed shARP (Supplementary Figures S1G–K and Supplementary Table S1) and, from this, a gene-level essenti- ality score termed GARP, which is the average shARP scores for the top two performing shRNAs for each gene. To reduce the number of false positives inherent to RNAi screens, we also performed genome-scale microarray gene expression profiling experiments on parental HCT116 cells and in all five query cell lines in order to measure target mRNA levels, and used these levels to determine a threshold for the presence/absence. Therefore, using stringent negative dGARP scores (P<0.05) and filtering for mRNA target gene expression (i.e., the presence/absence; see Supplementary Table S3), we generated a network of negative genetic interactions across the five query genes consisting of 2014 nodes and 2617 edges, which we will refer to hereafter as a differential essentiality (DiE) network

Suppelemntal Figure S1 has: (L-P) Frequency distribution of dGARP score highlighting the top SSL hits with p-value <0.05 in black bars from each screen. The can be used to parse Supplemental Table S2, which has the corresponding dGARP scores. It is hard to figure out the exact numbers from the Figure though.

Clone this wiki locally