Graph Classification with GIN and GCN with Global Pooling

Project Overview This project demonstrates graph classification techniques using the Graph Isomorphism Network (GIN) architecture to process and model molecular structures for classification and regression tasks. Specifically, it focuses on molecular datasets, leveraging SMILES strings to build graph representations. The project includes data processing, model implementation with PyTorch Geometric, and a study of ensemble performance combining GIN and Graph Convolutional Network (GCN) architectures.

Features

Molecular Data Processing: Includes steps to process SMILES data into graph-compatible formats.
Classification and Regression: Models both binary classification (HIV activity) and regression tasks (lipophilicity prediction).
Ensemble Modeling: Tests the performance of combining GCN and GIN architectures.

Installation

Prerequisites

Python 3.x
Jupyter Notebook or Google Colab
Required packages:
- torch
- torch-geometric
- rdkit
- ogb

Installation Steps

Clone the repository:

git clone https://github.com/btarun13/Graph_classification_related.git
cd your-repo-name

Get code locally or push it to colab
(Optional) Install additional dependencies if using Google Colab.

# Run in a cell in Colab
!pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-2.2.1+cu121.html
!pip install torch-sparse -f https://pytorch-geometric.com/whl/torch-2.2.1+cu121.html
!pip install torch-geometric
!pip install rdkit

Usage

Load Data: Ensure the datasets for HIV and Lipophilicity are in your environment or specify their paths. Example datasets can be downloaded from MoleculeNet.
Data Processing: Convert SMILES data to graph structures suitable for the GIN model.
Training and Evaluation: Follow the steps in the notebook to train GIN and GCN models, evaluate performance, and explore ensemble approaches.

Example Commands

Data loading and preprocessing:

Copy code
hiv_data = pd.read_csv("/path/to/HIV.csv")
lipo_data = pd.read_csv("/path/to/Lipophilicity.csv")

Model Training:

# Train GIN model
gin_model = GINConv(...)

Follow the training steps in the notebook

Results

The notebook provides an evaluation of:

Classification Accuracy for HIV activity prediction.
Ensemble Comparison between GIN, GCN, and combined models.
Final function would give a probability estimate with SMILE string and model(you use for estimate) Eg. smile_to_hiv_prob(i,best_model).item() == estimate

Acknowledgments

Special thanks to the creators of the datasets provided by MoleculeNet, and to the developers of PyTorch Geometric and the Open Graph Benchmark (OGB) team.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Graph_classification_with_GIN.ipynb		Graph_classification_with_GIN.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Graph Classification with GIN and GCN with Global Pooling

Features

Installation

Installation Steps

Usage

Example Commands

Results

Acknowledgments

License

About

Releases

Packages

Languages

License

btarun13/redpanda

Folders and files

Latest commit

History

Repository files navigation

Graph Classification with GIN and GCN with Global Pooling

Features

Installation

Installation Steps

Usage

Example Commands

Results

Acknowledgments

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages