This GitHub repository contains the code for the implementation and evaluation of the Heterogeneous Graph Transformer (HGT) architecture proposed in the master thesis titled "Improving Heterogeneous Graph Transformer Architecture for Processing Heterogeneous Graphs".
$ pip install -r requirements.txt
Preprocessed ACM can be found in: https://pan.baidu.com/s/1V2iOikRqHPtVvaANdkzROw key:50k2 or https://github.com/Jhy1993/HAN/tree/master/data/acm
Preprocessed DBLP can be found in: https://pan.baidu.com/s/1Qr2e97MofXsBhUvQqgJqDg key:6b3h or https://github.com/Jhy1993/HAN/tree/master/data/DBLP_four_area
Preprocessed IMDB can be found in: https://pan.baidu.com/s/199LoAr5WmL3wgx66j-qwaw key: qkec or https://raw.githubusercontent.com/Jhy1993/HAN/master/data/imdb/movie_metadata.csv
The HGT architecture is implemented using the code provided by the authors in their official paper https://arxiv.org/pdf/2003.01332.pdf The most important files in this projects are as follow:
-
Model
- model_base.iypnb --> Simple HGT implementation
- model_com_embedding.ipynb --> HGT with integration of community embedding as trainable vector
-
Data
- data.ipynb --> Datasets download and graph building
- ogb-3.ipynb --> ogb dataset download and graph building
-
Experiments
- train_ComEmb.ipynb --> Model training with comuunities embeddings
- train_Optuna_FastRP.ipynb --> Model training and tuning FastRP hyperparameters via Optuna
- train_base.ipynb --> Base Model training
- train_base_optuna.ipynb --> Model training and tuning HGT hyperparameters via Optuna
-
Features Engineering
- communities.ipynb --> Add Features from communities
- node_embedding.ipynb --> Add features from node embedding
- utils.ipynb --> Add fetaures from properties and stucture of graph
-
FastRP
- fastRP_impl.ipynb --> Implementaion of FastRP algorithm