This is the code repository of "OUTRE: An Out-of-core De-Redundancy Framework for GNN Training on Massive Graphs within A Single Server". The code of OUTRE is built on an existing GNN training framework Ginex. The Bloom Filter implementation in OUTRE is from here.
-
Disable
read_ahead
on Linux.sudo -s echo 0 > /sys/block/$block_device_name/queue/read_ahead_kb
-
Install necessary Linux packages.
sudo apt-get install -y build-essential
sudo apt-get install -y cgroup-tools
sudo apt-get install -y unzip
sudo apt-get install -y python3-pip
andpip3 install --upgrade pip
- Compatible NVIDIA CUDA driver and toolkit.
-
Install Python packages.
- PyTorch
- ogb
- PyG
- DGL with version of >= 1.0
- others that necessary
-
Install ninja.
sudo wget https://github.com/ninja-build/ninja/releases/download/v1.8.2/ninja-linux.zip sudo unzip ninja-linux.zip -d /usr/local/bin/ sudo update-alternatives --install /usr/bin/ninja ninja /usr/local/bin/ninja 1 --force
-
Use cgroup to limit the memory size. For example, we limit the host memory size to 64GB.
sudo -s cgcreate -g memory:64gb echo 64000000000 > /sys/fs/cgroup/memory/64gb/memory.limit_in_bytes
-
Allocate enough swap area.
-
Prepare dataset
python3 prepare_dataset_mag.py --dataset mag240m
-
Partition the original graph
python3 partition_fennel_twolevel.py --dataset mag240m
-
Create neighbor cache
python3 create_neigh_cache.py --neigh-cache-size 10000000000
-
Get
PYTHONPATH
python3 get_pythonpath.py
-
Run OUTRE on mag240m-cite. Replace
PYTHONPATH=...
with the outcome of step 4.sudo PYTHONPATH=xxx cgexec -g memory:64gb python3 -W ignore run_profiling.py --neigh-cache-size 10000000000 --feature-cache-size 30000000000 --dataset mag240m sudo PYTHONPATH=xxx cgexec -g memory:64gb python3 -W ignore run_main.py --neigh-cache-size 10000000000 --feature-cache-size 30000000000 --num-epochs 1 --dataset mag240m