Skip to content

Commit

Permalink
feat: add PPI analysis to Tutorial notebook
Browse files Browse the repository at this point in the history
  • Loading branch information
brucetony committed Sep 9, 2021
1 parent cf0f00d commit e4c8cf3
Showing 1 changed file with 225 additions and 5 deletions.
230 changes: 225 additions & 5 deletions notebooks/Tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -973,18 +973,238 @@
"cell_type": "markdown",
"id": "2d7e196d",
"metadata": {},
"source": [
"# PPI Analysis\n",
"Next, we perform an analysis of the identified proteins using information gathered by [e(BE:L)](https://github.com/e-bel/ebel).\n",
"The following commands will download data from 4 major PPI databases: BioGRID, Pathway Commons, StringDB, and IntAct,\n",
"and check which pathways/interactions are known for every identified secondary target."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "afe6b98c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Please insert OrientDB root password: ········\n"
]
}
],
"source": [
"# Uncomment the following line if you need to install e(BE:L)\n",
"#!pip install ebel git+https://github.com/orientechnologies/pyorient\n",
"\n",
"import pandas as pd\n",
"from ebel import Bel\n",
"bel = Bel()"
]
},
{
"cell_type": "markdown",
"id": "95c1793d",
"metadata": {},
"source": [
"## Download PPI Information\n",
"The following cell downloads information from the PPI databases and inserts into a RDBMS (SQLlite [default] or MySQL). \n",
"**WARNING** This step may take awhile."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c93c7640",
"metadata": {},
"outputs": [],
"source": [
"bel.biogrid.update()\n",
"bel.intact.update()\n",
"bel.stringdb.update()\n",
"bel.pathway_commons.update()"
]
},
{
"cell_type": "markdown",
"id": "86a160e3",
"metadata": {},
"source": [
"## Gather Hits\n",
"Now we check each PPI database for associated information on each secondary target."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2b7476fd",
"metadata": {},
"outputs": [],
"source": [
"proteins = list(summary.Target.unique())"
]
},
{
"cell_type": "markdown",
"id": "6d42612b",
"metadata": {},
"source": [
"### Pathway Commons"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "722461b2",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"sql = f\"\"\"Select\n",
" pc.participant_a a,\n",
" pc.interaction_type int_type,\n",
" pc.participant_b b,\n",
" group_concat(distinct pn.name) pathway_names,\n",
" group_concat(distinct s.source) sources,\n",
" group_concat(distinct p.pmid) pmids\n",
"from\n",
" pathway_commons pc left join\n",
" pathway_commons__pathway_name pc_pn on (pc.id=pc_pn.pathway_commons_id) left join\n",
" pathway_commons_pathway_name pn on (pc_pn.pathway_commons_pathway_name_id = pn.id) left join\n",
" pathway_commons__source pc_s on (pc.id=pc_s.pathway_commons_id) left join\n",
" pathway_commons_source s on (pc_s.pathway_commons_source_id=s.id) left join\n",
" pathway_commons_pmid p on (p.pathway_commons_id=pc.id)\n",
"where\n",
" (pc.participant_a in {proteins} and pc.participant_b = 'MAPT') or\n",
" (pc.participant_b in {proteins} and pc.participant_a = 'MAPT')\n",
"group by\n",
" pc.participant_a, pc.interaction_type, pc.participant_b\"\"\"\n",
"\n",
"pc_hits = pd.read_sql(sql, engine)"
]
},
{
"cell_type": "markdown",
"id": "95d849ed",
"metadata": {},
"source": [
"### BioGRID"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a7b5fe4d",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"sql = f\"\"\"Select\n",
" ia.symbol a,\n",
" ib.symbol b,\n",
" bes.experimental_system,\n",
" bes.experimental_system_type\n",
"from\n",
" biogrid b inner join\n",
" biogrid_interactor ia on (b.biogrid_a_id=ia.biogrid_id) inner join\n",
" biogrid_interactor ib on (b.biogrid_b_id=ib.biogrid_id) inner join\n",
" biogrid_experimental_system bes on (b.experimental_system_id=bes.id)\n",
"where\n",
" (ia.symbol = 'MAPT' and ib.symbol in {proteins}) or\n",
" (ib.symbol = 'MAPT' and ia.symbol in {proteins})\"\"\"\n",
"\n",
"biogrid_hits = pd.read_sql(sql, engine)"
]
},
{
"cell_type": "markdown",
"id": "9f7fffcf",
"metadata": {},
"source": [
"### IntAct"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fa77bf4e",
"metadata": {},
"outputs": [],
"source": [
"sql = f\"\"\"Select\n",
" ha.symbol as symbol_a,\n",
" hb.symbol as symbol_b,\n",
" i.confidence_value, \n",
" i.detection_method, \n",
" i.interaction_type, \n",
" i.pmid\n",
"from \n",
" intact i inner join \n",
" hgnc_uniprot hua on (i.int_a_uniprot_id=hua.accession) inner join \n",
" hgnc ha on (hua.hgnc_id=ha.id) inner join \n",
" hgnc_uniprot hub on (i.int_b_uniprot_id=hub.accession) inner join \n",
" hgnc hb on (hub.hgnc_id=hb.id)\n",
"where \n",
" (ha.symbol='MAPT' and hb.symbol in {proteins}) or\n",
" (hb.symbol='MAPT' and ha.symbol in {proteins})\n",
"order by confidence_value desc\n",
"\"\"\"\n",
"intact_hits = pd.read_sql(sql, engine)"
]
},
{
"cell_type": "markdown",
"id": "fb4d1f5e",
"metadata": {},
"source": [
"### StringDB"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f3527343",
"metadata": {},
"outputs": [],
"source": [
"sql = f\"\"\"Select * \n",
"from \n",
" stringdb \n",
"where \n",
" (symbol1='MAPT' and symbol2 in {proteins}) or\n",
" (symbol2='MAPT' and symbol1 in {proteins})\n",
"order by combined_score desc\n",
"\"\"\"\n",
"stringdb_hits = pd.read_sql(sql, engine)"
]
},
{
"cell_type": "markdown",
"id": "aa57a3ca",
"metadata": {},
"source": [
"# Connecting to a Different Knowledge Graph\n",
"By default, this package connects to the Alzheimer's Disease based Knowledge Graph (KG) developed under the MAVO project, available at https://graphstore.scai.fraunhofer.de. There are other KGs available, however, and here you can choose to connect to a different one if desired. \n",
"By default, this package connects to the Alzheimer's Disease based Knowledge Graph (KG) developed under the MAVO project, available at https://graphstore.scai.fraunhofer.de. There are other KGs available, however, and here you can choose to connect to a different one if desired.\n",
"\n",
"The commented out code shows how one can connect instead to the COVID KG."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "afe6b98c",
"metadata": {},
"execution_count": null,
"id": "b6604375",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"from ebel_rest import connect\n",
Expand All @@ -1008,7 +1228,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
"version": "3.8.2"
}
},
"nbformat": 4,
Expand Down

0 comments on commit e4c8cf3

Please sign in to comment.