Skip to content

Latest commit

 

History

History
123 lines (91 loc) · 3.94 KB

README.md

File metadata and controls

123 lines (91 loc) · 3.94 KB

DTi2Vec: Drug-Target interaction prediction using network embedding and ensemble learning

Submitted: 16 December 2020 Accepted: 05 September 2021 Published: 22 September 2021

Link: https://jcheminf.biomedcentral.com/articles/10.1186/s13321-021-00552-w


This code is implemented using Python 3.7

For any qutions please contact the first author:

Maha Thafar

Email: [email protected]

Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST) - Collage of Computers and Information Technology, Taif University (TU)


Getting Started

DTi2Vec Workflow

image

Prerequisites:

There are several required Python packages to run the code:

  • gensim (for node2vec code)
  • numpy
  • Scikit-learn
  • imblearn
  • pandas
  • xgboost

These packages can be installed using pip or conda as the follwoing example

pip install -r requirements.txt

Files Description:

The important folders are (to run the code):

1.(Input) folder: that includes four folder for 5 datasets include:

  • Nuclear Receptor dataset (nr),
  • G-protein-coupled receptor (gpcr),
  • Ion Channel (ic),
  • Enzyme (e)
  • FDA_DrugBank (DrugBank) which each one of them has all required data of drug-target interactions (in Adjacency matrix and edgelist format) and drug-drug similarity and target-target similarity in (square matrix format)

2.(EMBED) folder: that has also five folders coressponding for five datasets, each folder contains the generated node2vec Embedding file for each fold of training data (coressponding to the same seed of CV in the main node2vec code) - to access the main code of node2vec: https://github.com/aditya-grover/node2vec, or you can install node2vec library using pip install node2vec


There are 3 files of the implementation:

  • load_datasets.py --> to read the input data for each dataset sperately

  • tow main function 1- Random CV Setting --> - DTi2Vec_main.py 2- New Drug Setting --> - DTi2vec_newDrug_seting_generatedEMBED.py


Installing:

To get the development environment runining, the code get 3 parameters from the user which are:

  • the dataset name data:(nr, gpcr, ic, e, DrugBank)

  • the boosting classifier classifier: AdaBoost(ab), XGBoost (xgbc)

  • the fusion function func: (Concat, Hadmard, AVG, WL1, WL2)

  • (the defual values are: dataset:nr , classifier:ab, fusion function:Hadmard )

  • to run the code for random CV settings (to obtain best results for each dataset run the following:

python DTi2Vec_main.py --data nr --classifier ab --func WL1
python DTi2Vec_main.py --data gpcr --classifier xgbc --func Hadamard
python DTi2Vec_main.py --data ic --classifier xgbc --func Concat
python DTi2Vec_main.py --data e --classifier xgbc --func Concat
python DTi2Vec_main.py --data DrugBank --classifier xgbc --func Hadamard

--

For new drug setting: (It takes 2 args: dataset name and the fusion function)

  • The classifier is XGBoost for all dataset
  • The Embeddings are generated using node2vec for new drugs CV and can be found in:'EMBED/newDrug_EMBED'
  • and then read the generated embeddings in this code.
  • The best obtianed results for all datasets using WL1 except for DB using Hadmard

**Examples to run newDrug setting:

python DTi2vec_newDrug_seting_generatedEMBED.py --data nr  --func WL1
python DTi2vec_newDrug_seting_generatedEMBED.py --data DrugBank --func Hadamard

For citation:


Thafar, Maha A., Rawan S. Olayan, Somayah Albaradei, Vladimir B. Bajic, Takashi Gojobori, Magbubah Essack, and Xin Gao. "DTi2Vec: Drug–target interaction prediction using network embedding and ensemble learning." Journal of Cheminformatics 13, no. 1 (2021): 1-18.