Source code of "Graph convolutional and attention models for entity classification in multilayer networks"
The aim of this work is to generalize GNNs approaches by proposing a GNN framework for representation learning and semi-supervised classification in multilayer networks with attributed entities, and arbitrary number of layers and intra-layer and inter-layer connections between nodes.
We instantiated our framework with two new formulations of GAT and GCN models, namely ML-GCN and ML-GAT, specifically designed for arbitrary, attributed multilayer networks.
Zangari, L., Interdonato, R., Calió, A., Tagarelli, A.
Graph convolutional and attention models for entity classification in multilayer networks.
Appl Netw Sci 6, 87 (2021). https://doi.org/10.1007/s41109-021-00420-4.
The figure below illustrates the main components of our framework.
Crete a new folder named <dataset_name>, containing the following files:
-
A file named meta_info.txt, containing information about the input network, such as:
- N, the number of entities.
- L, the number of layers.
- E, whether the multilayer graph is directed (DIRECTED) or undirected (UNDIRECTED).
- TYPE, indicating the type of the input multilayer network, i.e., whether it is a multiplex network (MPX) or a general (arbitrary) multilayer network (GML).
For example, given an undirected multiplex graph with 20 entities, 3 layers, the meta_info.txt is:
N L E TYPE
20 3 UNDIRECTED MPX
-
A file named nodes.txt: a column vector indicating the numerical labels of the entities where the generic i-th row must contains the label associated with the i-th entity.
-
A file named net.edges, containing the input multilayer graph edge information.
- In case the network is multiplex (MPX) the format must be <layer source-node dest-node>.
- In the case of a general multilayer network (GML) the format must be <source-layer source-node dest-layer dest-node>.
Please note that node identifiers must be numeric, progressive and starting from 0. The layers identifiers must start with 1 and be progressive.
-
The features matrix for the entities, which must be in CSV format with comma as separator, if present.
Note that the files described in points 1, 2 and 3 are mandatory.
To train the models, run the train.py script as follows:
python training/train.py --data "root_dir" --dataset "dataset_name"
which will run the network named "dataset_name" inside the directory "root_dir". To run with input features:
python training/train.py --data "root_dir" --dataset "dataset_name" --feat-distribution='features.csv'
For the list of all hyper-parameters, see the utils/params.py script.
The required libraries for code execution are listed in requirements.txt.