This repository contains code implementing the measurements and dynamical processes on hypergraphs described in:
The code was written under python 3.9.15 and relies on standard libraries including numpy
, scipy
, matplotlib
, and networkx
. It also relies heavily on xgi
, the compleX Group Interactions package.
This repository does not implement a package, therefore beyond installing the requirements, there is no installation.
The results from the paper can be reproduced using the corresponding files as listed below. Note that the simulations on real data were run on a large computer over a long period.
- Figures 2, 4 =>
notebooks/plot_overlap_dists.ipynb
- Figure 3 =>
notebooks/compare_randomizations.ipynb
(afternotebooks/dag_components.ipynb
) - Figure 5 =>
notebooks/dag_shortest_paths.ipynb
- Figure 6 =>
notebooks/dag_visualisation.ipynb
- Figure 7 =>
src/rnhm_single_seed_simulation.py
- Figure 8 =>
src/rnhm_seed_simulation.py
- Figure 9,10 =>
notebooks/plot_simultaneous_simulations.ipynb
(after generating results usingsrc/run_simulation.py
as described below)
There are two main ways to run simulations of encapsulation dynamics. The file run_simulation.py
can be used in conjunction with a configuration file to run simulations from the command line. Alternatively, simulation parameters can be specified directly in a dictionary and passed to one of the run_*_simulations
functions found in simulations.py
.
To run a simulation from the command line, first navigate to the src
directory and check the output of python run_simulation.py -h
, which will list the required and optional arguments to the simulation code.
You will need to set up a configuration file, or simply use configurations.ini
already included in the repository. The configuration file will be parsed using the configparser module and needs two sections, one to set default paths, and another for the simulation you want to run. Here is an example:
# Default paths to data and results directories
[default-local]
data_prefix = ../data/
results_prefix = ../results/
# Configuration for a simulation on the primary school contact data
[contact-primary-school]
dataset_name = contact-primary-school
initial_active = 1
steps = 25
active_threshold = 1
num_simulations = 25
read_function = read_data
Once you have created this configuration file, you can run the simulations with:
python run_simulation.py configurations.ini contact-primary-school simultaneous encapsulation-immediate NCPU
Using the configuration file, this command will run 25 simulations of 25 steps using NCPUs (an integer number of CPUs to use) and a threshold of 1 active subhyperedge. Seeding will be a single uniform random edge (the default combined with initial_active=1
in the configuraiton file; use command line option --seed_funct
to control the seed function).
Both the number of seeds and the active threshold can also be controlled from the command line for convenience when running many simulations with different parameters. When given, command line inputs take precedence over configuration file parameters.
Results are saved as serialized pickle files in the ../results/
directory.
You can also run simulations from within a python session by specifying a configuration dictionary directly. Here is an example:
from utils import read_data, largest_connected_component
from layer_randomization import layer_randomization
from seed_functions import smallest_first_seed
from simulation import run_many_simulations
from plot_simulation_results import plot_cumulative_averages
dataset_name = "email-Enron"
dataset = f"../data/{dataset_name}/{dataset_name}-"
hyperedges = read_data(dataset, multiedges=False)
hyperedges = largest_connected_component(hyperedges, remove_single_nodes=True)
configuration = {
"seeding_strategy": "edge",
"seed_function": smallest_first_seed,
"initial_active": 10,
"num_simulations": 10,
"steps": 10,
"active_threshold": 1,
"selection_name": "simultaneous",
"selection_function": None,
"update_name": "encapsulation-immediate",
"update_function": None,
"encapsulation_update": True,
"node_assumption": False
}
output_observed = run_many_simulations(hyperedges, configuration)
random_hyperedges = largest_connected_component(layer_randomization(hyperedges), remove_single_nodes=True)
output_random = run_many_simulations(random_hyperedges, configuration)
fig, axs = plot_cumulative_averages(output_observed, output_random, normalized=False)
If you want to run simulations in parallel, you can swap run_many_simulations(hyperedges, configuration)
for run_many_parallel(hyperedges, configuration, ncpus)
, where ncpus
is an int
corresponding to the number of cpus to use.
The relevant options for dynamics, controlled by update_funct
positional argument (command line) and configuration['update_name']
dictionary entry (API), are:
'encapsulation'
: encapsulation dynamics including all encapsulation relationships'encapsulation-immediate'
: encapsulation dynamics including only immediate encapsulation relationships (i.e.,k->k-1
DAG edges)'encapsulation-empirical'
: relaxation ofencapsulation-immediate
dynamics including relationships between sizek
and subhyperedges of maximum sizek'<k
existing in the hypergraph (e.g., a hyperedge of size 5 has no encapsulation relationships with hyperedges of size 4, but some with sizes 3 and 2, the edges to the size 3 hyperedges will "count" for the dynamics)
Dynamics are strict by default. For non-strict dynamics, specify --node_assumption
or set configuration['node_assumption']=True
.
To use the "all encapsulated hyperedges" threshold, specify --encapsulation_all_thresh
or set configuration['active_threshold']='all'
.
The file encapsulation_dag.py
implements separate computation of the encapsulation and overlap line graphs. You can also find implementations of these functions in the xgi package, specifically in the convert module (permalink to merge commit).
The datasets used in the paper were made available by Austin Benson and can be found on his website: https://www.cs.cornell.edu/~arb/data/, along with appropriate citations. The function read_data
in utils.py
can read these datasets (thanks to Phil Chodrow for the function, which I adapted from his repository here).
This README is a work in progress. If you have any questions, open an issue or email me at [email protected].