This project focuses on implementing a Graph Convolutional Network (GCN) for analyzing a dataset related to terrorist attacks. The dataset contains two main components: information about the attack entities and the links connecting these entities to form a graph structure. The goal is to classify the attributes of these entities based on the provided data.
The project begins by downloading the dataset from an external source. This directory contains a subset of the data containing terrorism related information collected by the MIND Lab at UMD.
- Link for downloading the dataset - (https://linqs-data.soe.ucsc.edu/public/lbc/TerrorAttack.tgz)
This subset of the dataset was designed for classification purposes and contains two types of information related to terrorism attack entities: the attributes of the entities and the links that connect various entities together to form a graph structure. Using the files in this directory one can construct two different graphs involving terrorist attacks, one based on co-located attacks and another, based on co-located attacks organized by the same terrorist organization. Please take a look at this paper (http://www.cs.umd.edu/~sen/pubs/sna2006/RelClzPIT.pdf) for more information about the experiment setup.
Below we give some more information about the files in this directory:
- terrorist_attack.labels: Contains the labels that we want to assign to each terrorist attack entity.
- terrorist_attack.nodes: Contains the various terrorist attack entities. Each line begins with the unique id of the entity followed by a 0-1 vector indicating the attributes that are present and the attributes that are absent. The last entry in the line is the correct class label of the entity.
- terrorist_attack_loc.edges: Each line contains two ids of terrorist attacks. This file defines the edges connecting co-located terrorist attacks.
-
The downloaded dataset is extracted from a compressed file, and the extracted files are listed to verify the extraction.
-
The data is transformed into a more suitable format. This includes converting the provided files into CSV format. Each data file is processed, and the resulting CSV files are used for subsequent analysis.
-
The labels in the dataset are mapped to integer IDs to facilitate classification tasks.
-
Nodes (entities) and features are organized to create a graph structure for further analysis.
-
The GCN model is implemented using the DGL (Deep Graph Library) library. It is designed to work with graph-structured data.
-
The graph is created using node features, labels, and adjacency information from the dataset.
-
The dataset is split into training, validation, and test sets for model evaluation.
-
A GCN model is defined, and training is carried out with the specified number of epochs and learning rate.
-
The best validation and test accuracies are tracked during training.
-
The trained model is saved for future use.
-
Input data, which may be in CSV format, is preprocessed to create a DGL graph structure.
-
The trained GNN model is used for inference on this input data.
-
Predicted class labels are obtained from the model's output.
-
The predicted class labels are mapped back to their original labels.
-
Community detection is applied to the graph data using Louvain community detection.
-
Nodes are assigned to different communities, and node colors are determined based on these community assignments.
-
The graph, with nodes colored by their community, is visualized for community analysis.
This project demonstrates the implementation of a Graph Convolutional Network on a terrorist attack dataset. It covers data preprocessing, model training, inference, and community analysis. The final results may include classification accuracies, community information, and visualization.
- Avnish Singh - https://github.com/avnishs17
- Himanshu Tiwari - https://github.com/Himanshutiwari15