Code for the BSc Thesis "Empirical Validations of Graph Structure Learning methods for Citation Network Applications" by Hoang Thien Ly at Faculty of Mathematics and Information Science, Warsaw University of Technology.
This Bachelor’s Thesis aims to examine the classification accuracy of graph structure learning methods in graph neural networks domain, with a focus on classifying a paper in citation network datasets. Graph neural networks (GNNs) have recently emerged as a powerful machine learning concept allowing to generalize successful deep neural archi- tectures to non-Euclidean structured data with high performance. However, one of the limitations of the majority of current GNNs is the assumption that the underlying graph is known and fixed. In practice, real-world graphs are often noisy and incomplete or might even be completely unknown. In such cases, it would be helpful to infer the graph structure directly from the data. Additionally, graph structure learning permits learning latent structures, which may increase the understanding of GNN models by providing edge weights among entities in the graph structure, allowing modellers to further analysis.
As part of the work, we will:
- review the current state-of-the-art graph structure learning (GSL) methods.
- empirically validate GSL methods by accuracy scores with citation network datasets.
- analyze the mechanism of these approaches and analyze the influence of hyperparameters on model’s behavior.
- discuss future work.
Keywords: graph neural network, graph structure learning, empirical validations, citation network applications.
Create a Conda virtual environment and install all the necessary packages
conda create -n DGMenv python=3.8
conda activate DGMenv
conda install -c anaconda cmake=3.19
conda install pytorch==1.8.1 torchvision==0.9.1 torchaudio==0.8.1 cudatoolkit=10.1 -c pytorch
pip install pytorch_lightning==1.3.8
pip install torch-scatter==2.0.8 -f https://data.pyg.org/whl/torch-1.8.1+cu101.html
pip install torch-sparse==0.6.12 -f https://data.pyg.org/whl/torch-1.8.1+cu101.html
pip install torch-geometric
For the dDGM framework, to train a model with the default options run the following command:
python train.py
Other frameworks, codes are presented in the Jupyter notebook.
- model_dDGM.py: discrete DGM model definition, training/validation code
- layers.py: include some different layers for experiments
- Euclidean distance
- Poincare distance
- Discrete DGM module
- Continuous DGM module
- MLP module
- Identity module
Main file to run experiments: simply run python train.py with parameters:
- --num_gpus: total number of gpus
- --dataset: which dataset used to train (UKBiobank, Tadpole require additional changes)
- --fold: k-fold validation (for UKBiobank, Tadpole)
- --conv_layers: number of convolutional layers
- --dgm_layers: number of dgm layers
- --fc_layers: number of linear layers
- --pre_lc: pre linear layer setting
- --gfun: diffusion function types: use state-of-the-art layers: gcn, gat, edgeconv
- --ffun: graph embedding function types: use state-of-the-art layers: gcn, gat (+mlp, id for experiments)
- --k: k param for k-gumbel sampling
- --pooling: pooling type (default = add)
- --dropout: drop out probability during training
- --lr: learning rate
- --test_eval: number of epoch for evaluation