Repository associated to paper "Proxying Betweenness Centrality Rankings in Temporal Networks"
The temporal network file must contain one line for each temporal edge (a temporal edge is a triple (u,v,t)
, where u
and v
are two nodes and t
is one time in which the edge from u
to v
is available). All networks are considered as directed: hence, if the graph is undirected, then both the temporal edge (u,v,t)
and the temporal edge (v,u,t)
have to be included in the file. Nodes can be identified by any string (not necessarily a number), while times have to be integer values. The three elements of a temporal edge can be separated by any string, which can be specified while using the load_temporal_graph
function (see below). In the following, we assume that this string is just a space.
The file cannot include duplicate lines (that is, two identical temporal edges) or self-loop lines (that is, temporal edges from a node to itself). In case your file contains duplicate lines or self-loops lines, you should first execute the following bash
command.
awk '($1!=$2)' <in_file_name> | sort | uniq > <out_file_name>.txt
You can then use the newly generated file as input to the software.
We assume that the julia
REPL has been started within the directory TSBProxy
and the following command has been already executed (after having installed all the required packages).
include("src/ProxyingTSB.jl");
Assuming that the file workplace.txt
is contained in the directory graphs
included in the directory TSBProxy
, the corresponding temporal network can be loaded by executing the following command (note that, in this case, the separator between the three elements of a temporal edge is the space character).
tg = load_temporal_graph("graphs/workplace.txt", " ");
We can obtain some basic statistics about the temporal network by executing the following command.
print_stats(tg, graph_name="Workplace network");
The result in this case should be the following output.
====================================================
Temporal network: Workplace network
====================================================
Number of nodes 92
Number temporal of edges 19654
Number of unique time stamps 7104
====================================================
The values of the temporal shortest betweenness (in short TSB) of the graph tg
can be computed by executing the following command.
tsb, t_tsb = temporal_shortest_betweenness(tg, 50, false);
The second parameter specifies after how many processed nodes a message has to be printed on the console (in order to verify the status of the computation). If this parameter is 0
, then no ouptut is produced. The third parameter specifies whether the big integer implementation has to be used (in case there too many shortest paths). The execution of the above command should require less than a minute. The values returned are the array of the TSB values and the execution time.
The values of the temporal shortest betweenness and the execution time can be saved in the scores
and the times
directory, respectively, (included in the TSBProxy
directory) as follows.
save_results("workplace", "tsb", tsb, t_tsb);
Similarly to the computation of the TSB values, we can compute and save the values of the proxies described in the paper, that is, the static betweenness (in short, BC), the prefix foremost betweenness (in short, PREFIX), the shortest ego-betweenness (in short, EGOTSB), the prefix foremost ego-betweenness (in short, EGOPREFIX), and the pass-through degree (in short, PTD).
bc, t = betweenness(tg, true);
save_results("workplace", "bc", bc, t);
prefix, t = temporal_prefix_foremost_betweenness(tg, 50);
save_results("workplace", "prefix", prefix, t);
egotsb, t = temporal_ego_betweenness_centrality(tg, t_tsb, 50);
save_results("workplace", "egotsb", egotsb, t);
egoprefix, t = temporal_ego_prefix_foremost_betweenness(tg, t_tsb, 50);
save_results("workplace", "egoprefix", egoprefix, t);
ptd, t = pass_through_degree(tg);
save_results("workplace", "ptd", ptd, t);
The second parameter of the function betweenness
indicates whether feedback messages have to be printed (in order to verify the status of the computation), while the second parameter of the functions temporal_ego_betweenness_centrality
and temporal_ego_prefix_foremost_betweenness
denotes the maximum execution time, which can be usually set equal to the execution time for computing the TSB, and the third parameter specifies after how many processed nodes a message has to be printed on the console (in order to verify the status of the computation). In our example, the computation of the EGOTSB took more time than the computation of the TSB and, hence, no EGOTSB value file is created.
In the case of ONBRA, we have to decide the size of the sample of node pairs to be used to apply the sampling approximation method. In the paper, we consider the following strategy. We first estimate the execution time avg_t
of a BFS with input a pair of nodes, by computing the average execution time over a random sample of pairs od nodes (tipically of size between 100 and 500). We then divide the execution time for computing the TSB by avg_t
: the resulting value max_ss
is an estimate of the sample size which would cause the ONBRA algorithm executes as long as the computation of TSB.
max_ss = onbra_sample_size("workplace", 100, false, false);
The second parameter is the size of the sample to be used for computing the average time of a BFS, while the second and the third parameters specify whether the big integer data structure has to be used and the feedback messages have to be printed, respectively.
We then execute ONBRA for a sample sample size between 1 and max_ss
. In the following code we use a sample size equal to the maximum one.
onbra("workplace", tg, max_ss, 50, false);
The centrality values for each pair in the sample are saved in one file within the directory scores
and the statistics concerning the execution time are saved in another file within the directory times
.
We are now ready to evaluate the proxies in terms of their approximation of the TSB ranking. To this last aim, we use both the weighted Kendall tau correlation index between the proxie and the TSB rankings and the Jaccard index of the top k nodes in the proxie and the TSB rankings. For instance, in order to evaluate the static betweenness we can execute the following code.
tsb = read_centrality_values("scores/workplace/tsb.txt");
proxy = read_centrality_values("scores/workplace/bc.txt");
_, _, ktau = compute_correlations(tsb, proxy, false);
jproxy = jaccard(tsb, proxy, 75);
iproxy = intersection(tsb, proxy, 75);
(in the case of ONBRA, the second instruction has to be susbstituted by the following one
proxy = read_onbra_centrality_values("scores/workplace/onbra.txt", max_ss, length(tsb));
where max_ss
has to be substituted by the used sample size). The weighted Kendal tau and the Jaccard index corresponding to the first 50 nodes can then be printed by executing the following instructions.
println(ktau[1], " ", jproxy[50], " ", iproxy[50]);
In this example, the Kendall tau is approximately equal to 0.89, the Jaccard index is approximately equal to 0.82, and the intersection of the top 50 nodes is equalt to 45. Note that, in the case of ONBRA, these values would be approximately 0.92, 0.79, and 44 (clearly, these values can change because of different samples of pairs of nodes).
If you want to compare the KADABRA approximation of the static betweenness with the other proxies, you should first compute the KADABRA values (by using the implementation made available by the authors of KADABRA and by respecting the node indices) and save them in the kadabra.txt
file in the scores
directory.