layout | title |
---|---|
post |
Python Binding Getting Started |
{:.no_toc}
This is a tutorial for getting started with GraphIt and using the python bindings to compile GraphIt programs as a library and call it from python. If you are looking for a tutorial on building GraphIt programs as standalone application, you can visit the Standalone Getting Started tutorial.
In this tutorial we will write the PagerankDelta algorithm in GraphIt and run it from python on a graph loaded in python. We will also measure the performance of this algorithm from python. We will learn how to separate initialization from actual execution to get meaningful timing numbers. This tutorial is broadly devided in the following sections -
- auto-gen TOC: {:toc}
Make sure you have all the correct Open Source Software installed. First follow the README file here to clone and install graphIt. You will need either CILK or OPENMP to allow you to run the C++ code in parallel. If you dont have either you can get both by simply downloading GCC. Alternatively if you already have CILK or OPENMP you can use those too. This tutorial will go through how to use GraphIt via both CILK and OPENMP.
Clone graphit by going to GraphIt Something to note for the following tutorial. Everything will be done graphit/build/bin
If you have not yet already please read the basic information on the GraphIt Language.
element Vertex end
element Edge end
const edges : edgeset{Edge}(Vertex,Vertex);
const vertices : vertexset{Vertex};
const cur_rank : vector{Vertex}(float);
const ngh_sum : vector{Vertex}(float);
const delta : vector{Vertex}(float);
const out_degree : vector{Vertex}(int);
const damp : float = 0.85;
const beta_score : float;
const epsilon2 : float = 0.1;
const epsilon : float = 0.0000001;
const init_delta: float;
func updateEdge(src : Vertex, dst : Vertex)
ngh_sum[dst] += delta[src]/out_degree[src];
end
func updateVertexFirstRound(v : Vertex) -> output : bool
delta[v] = damp*(ngh_sum[v]) + beta_score;
cur_rank[v] += delta[v];
delta[v] = delta[v]-1.0/edges.getVertices();
output = (fabs(delta[v]) > epsilon2*cur_rank[v]);
ngh_sum[v] = 0;
end
func updateVertex(v : Vertex) -> output : bool
delta[v] = ngh_sum[v]*damp;
cur_rank[v] += delta[v];
ngh_sum[v] = 0;
output = fabs(delta[v]) > epsilon2*cur_rank[v];
end
func initVectors(v : Vertex)
cur_rank[v] = 0.0;
ngh_sum[v] = 0.0;
delta[v] = init_delta;
end
export func set_graph(edges_args: edgeset{Edge}(Vertex, Vertex))
edges = edges_args;
vertices = edges.getVertices();
cur_rank = new vector{Vertex}(float)();
ngh_sum = new vector{Vertex}(float)();
delta = new vector{Vertex}(float)();
out_degree = edges.getOutDegrees();
beta_score = (1.0 - damp) / edges.getVertices();
init_delta = 1.0 / edges.getVertices();
vertices.apply(initVectors);
end
export func do_pagerank_delta() -> final_ranks: vector{Vertex} (float)
var n : int = edges.getVertices();
var frontier : vertexset{Vertex} = new vertexset{Vertex}(n);
for i in 1:10
#s1# edges.from(frontier).apply(updateEdge);
var output : vertexset{Vertex};
if i == 1
output = vertices.filter(updateVertexFirstRound);
else
output = vertices.filter(updateVertex);
end
delete frontier;
frontier = output;
end
delete frontier;
final_ranks = cur_rank;
end
Page Rank Delta in Graphit
Here we will go through an example of writing GraphIt code for the Page Rank Delta application and then calling it from python using the python bindings. You can find the code used in this example along with a few other applicaitons under graphit/apps directory here.
Additionally here is a link to the GraphIt OOPSLA18 paper or the arxiv report here. Sections 4 and 5 give the complete breakdown of the PageRankDelta code. Please look here if you want a more detailed breakdown of the functionality of GraphIt.
element Vertex end
element Edge end
element definitions
Here we construct the basic Elements, Vertex
and Edge
with the element
keyword that will be used by graphit. Note these basic Elements do not have to be named "Vertex" or "Edge". Most Graph Algorithms will require that you have both of these. GraphIt supports multiple types of user-defined vertices and edges, which is important for algorithms that work on multiple graphs.
A quick refresher on Variables
const edges : edgeset{Edge}(Vertex,Vertex);
const vertices : vertexset{Vertex};
vertexset and edgeset definitions
Once we have declared the basic element types, we can use these to construct the vertexsets and edgesets. These two lines declare these edgeset, edges
and the vertexset vertices
. Each element of the edgeset is of Edge
type (specified between "{ }"), and the source and destination of the edge is type Vertex
. The source and destinations can also be of different types (for instance in case of a bipartite graph). Here we have just declared these two sets but haven't initialized them to anything. This is because these sets will be created using the graph passed to GraphIt from the python code.
We have added the const
keyword before the declarations to indicate these vertexsets and edgesets are globally accessible.
const cur_rank : vector{Vertex}(float);
const ngh_sum : vector{Vertex}(float);
const delta : vector{Vertex}(float);
const out_degree : vector{Vertex}(int);
vector definitions
Data for vertices and edges are defined as vectors associated with the element type denoted using the { }
syntax. For example, cur_rank
is associated with Vertex
, and is of type float
(specified in with the ()
syntax). This is similar to fields of a struct in C or C++, but is stored as a separate vector.
Notice again, that we have only delcared these vectors and haven't allocated them or initialized their data because the size and values of these will depend on the Graph passed in. We will be explicity initializing them in the in the set_graph
function.
const damp : float = 0.85;
const beta_score : float;
const epsilon2 : float = 0.1;
const epsilon : float = 0.0000001;
const init_delta: float;
scalar constants declaration
Next we declare the constants and scalar variables required for the algorithm. We also initialize those scalars that have values independent of the graph (for example the damp
float). We will initialize the variables that depend on the graph in the set_graph
function.
Next we move on to take a look at the functions used in the program.
A quick refresher on Functions
The algorithm uses three user-defined functions, updateEdge
, updateVertexFirstRound
, and updateVertex
.
func updateEdge(src : Vertex, dst : Vertex)
ngh_sum[dst] += delta[src]/out_degree[src];
end
updateEdge
takes in src
and dst
of an edge as arguments. The function adds to the current ngh_sum of the destination vertex dst
, the delta
divided by the out_degree
of the source vertex src
. This function is later used in the main function and applied to every edge in the edgeset.
func updateVertexFirstRound(v : Vertex) -> output : bool
delta[v] = damp*(ngh_sum[v]) + beta_score;
cur_rank[v] += delta[v];
delta[v] = delta[v]-1.0/edges.getVertices();
output = (fabs(delta[v]) > epsilon2*cur_rank[v]);
ngh_sum[v] = 0;
end
updateVertexFirstRound
takes in a vertex, v
, and returns a boolean. It does this by multiplying the ngh_sum
with the damping factor and adding the basescore. From this it computes the rank and using the delta it computes whether or not it exceeds a certain threshold. If this threshold is exceeded than it returns a boolean True and if not a boolean False. Then it sets the ngh_sum
back to 0.
func updateVertex(v : Vertex) -> output : bool
delta[v] = ngh_sum[v]*damp;
cur_rank[v] += delta[v];
ngh_sum[v] = 0;
output = fabs(delta[v]) > epsilon2*cur_rank[v];
end
updateVertex
also takes in a vertex and returning a boolean. However in this case it does not add the base score to delta
when determining Delta. Similarly then by comparing if the delta exceeded the threshold of epilson times the rank it outputs a True or False.
updateVertexFirstRound
and updateVertex
will be used later on to filter out the "active vertices". These are the vertices that will used in the next iteration of the algorithm. These active vertices are also known as the frontier. The reason for two functions is that the first time we update the vertexs some additional computation needs to be done as described above that isn't needed later on. Therefore the second function is run only once in the beginning of the algorithm.
func initVectors(v : Vertex)
cur_rank[v] = 0.0;
ngh_sum[v] = 0.0;
delta[v] = init_delta;
end
export func set_graph(edges_args: edgeset{Edge}(Vertex, Vertex))
edges = edges_args;
vertices = edges.getVertices();
cur_rank = new vector{Vertex}(float)();
ngh_sum = new vector{Vertex}(float)();
delta = new vector{Vertex}(float)();
out_degree = edges.getOutDegrees();
beta_score = (1.0 - damp) / edges.getVertices();
init_delta = 1.0 / edges.getVertices();
vertices.apply(initVectors);
end
initialization functions
We now create the functions that will initialize the vectors and other variables that depend on the graph. Firstly, the set_graph
function. Notice this function is marked with the export
keyword, which means that this function is to be called from python and should be a part of the module loaded into python. This function takes 1 argument, which is the edgeset passed from the python code (as scipy.sparse.csr_matrix
) and does not return anything.
You can read more about the export
keyword and how the GraphIt types interface with the python types in the export section of the language manual.
The first thing we initialize is the edgeset edges
by assigning the passed in argument to it. We then initialize the vertexset vertices
by using the getVertices()
function from edges
. This function returns the vertices from the graph passed in as argument.
We then allocate memory for the vectors associated with the vertices with new Vector{Vertex}(float)()
. This is okay to do now because the number of elements is known. We do the same for the variables beta_score
and init_delta
. The vector out_degree
is initialized by calling the getOutDegrees()
function from edges
.
We initialize the values for the vectors by using a user defined function initVectors
. This function takes in an argument the vertex to initialize and sets the initial value of the vectors for that vertex. We invoke this function on all the vertices in the set_graph
function by using the vertexset.apply
operator.
You should note here that we have created a separate set_graph
function to isolate it from the actual algorithm in the do_pagerank_delta
function. We do this so we can measure the performance of the actual algorithm and not the cost of initializing the graph related datastructures. If you are not timing the execution of the algorith, it is perfectly fine to add an arguement to the main do_pagerank_delta
function and do all the initializations there.
Now, all the data structures are initialized and we are ready to perfom the pagerank_delta operation.
export func do_pagerank_delta() -> final_ranks: vector{Vertex} (float)
var n : int = edges.getVertices();
var frontier : vertexset{Vertex} = new vertexset{Vertex}(n);
for i in 1:10
#s1# edges.from(frontier).apply(updateEdge);
var output : vertexset{Vertex};
if i == 1
output = vertices.filter(updateVertexFirstRound);
else
output = vertices.filter(updateVertex);
end
delete frontier;
frontier = output;
end
delete frontier;
final_ranks = cur_rank;
end
main function of PageRankDelta
The main
function is where your program comes together and runs together with all the user-defined functions. Similar to C and C++, you have to explicitly name the function main
.
GraphIt is designed to separate edge processing logic from edge traversal, edge filtering (from, to, srcFilter, and dstFilter), atomic synchronization, and modified vertex deduplication and tracking logic (apply and applyModified). This separation enables the compiler to represent the algorithm from a high level, exposing opportunities for edge traversal and vertex data layout optimizations. Moreover, it frees the programmer from specifying low-level implementation details, such as synchronization and deduplication logic.
The algorithm iterates for 10 iterations to update each vertex's cur_rank
value. The cur_rank
is assumed to converge after 10 iterations, and represents the importance of each vertex based on its topological structure. In each iteration, the algorithm maintains the set of vertices whose rank has changed greatly from previous iterations, which is known as the frontier
. We start with having all vertices in the frontier with frontier initalization ( var frontier : vertexset{Vertex} = new vertexset{Vertex}(n);
). The frontier generated by
output = vertices.filter(updateVertexFirstRound)
in the first iteration, which applies the updateVertexFirstRound
function to all the vertices
. In later iterations the frontier is generated by
output = vertices.filter(updateVertex)
.
The user has to explicitly manage the memory and swapping of the frontier with delete frontier;
and frontier = output;
In each iteration we update the delta
values using the updateEdge
function in #s1# edges.from(frontier).apply(updateEdge);
. We use the operator from
to obtain the set of edges that comes out from the frontier
. Then we use apply
to apply the updateEdge
function on these edges. The label #s1
is used as a way to identify the edgeset operator for performance tuning using the scheduling.
The vector cur_rank
is returned by assigning it to the final_ranks
variable.
To learn how to use the scheduling functions with GraphIt, please visit the scheduling section of the main getting-started document.
Before you can run the code above you need to first follow these steps and build GraphIt
Before we start building GraphIt we need to make sure we have all the dependencies installed. Currently GraphIt requires the following packages to be installed -
- cmake (version >= 3.5)
- g++ (version >= 5.4)
- python 2.7 or Python 3.x (for running basic test cases)
- python 3.x (x >= 5, for python bindings)
- pip3 (for installing the required packages)
- pybind11
- can be installed using the command
pip3 install pybind11
- can be installed using the command
- scipy
- can be installed using the command
pip3 install scipy
- can be installed using the command
To perform an out-of-tree build of Graphit:
First clone the repository.
git clone [email protected]:GraphIt-DSL/graphit.git
After you have cloned the directory:
cd graphit
mkdir build
cd build
cmake ..
make
To run the C++ test suite do (all tests should pass):
cd build/bin
./graphit_test
To run the Python end-to-end test suite:
start at the top level graphit directory cloned from Github, NOT the build directory (All tests would pass, but some would generate error messages from the g++ compiler. This is expected.) Currently the project supports Python 2.x and Python 3 for the basic tests (test.py and test_with_schedules.py) and Python 3.x (x >= 5) for the pybind test cases (pybind_test.py), which use scipy.
cd build
python python_tests/test.py
python python_tests/test_with_schedules.py
export PYTHONPATH=.
python3 python_tests/pybind_test.py
We are now ready to invoke the GraphIt program we wrote above from python 3.x (x >= 5). We start by writing the following python program.
import graphit
import scipy.io
from scipy.sparse import csr_matrix
import sys
import time
module = graphit.compile_and_load("pagerank_delta_export.gt")
graph = csr_matrix(scipy.io.mmread(sys.argv[1]))
module.set_graph(graph)
start_time = time.perf_counter()
ranks = module.do_pagerank_delta()
end_time = time.perf_counter()
print ("Time elapsed = " + str(end_time - start_time) + " seconds")
This file pagerank_delta.py
can be saved anywhere including your home directory. This is because the GraphIt compiler can be used anywhere as long as the PYTHONPATH
environment variable points to the right directory.
We start by importing the graphit
module. This module has helper functions to compile and invoke Graphit functions. We also import the supporting libraries required to load the graph and time the computation.
We then ask the graphit module to load the GraphIt program we wrote using the compile_and_load
function. We pass to it the path of the file we wrote above. This function returns a module that has all the functions we had exported form the GraphIt file. Notice that for the path, we have just provided "pagerank_delta_export.gt"
because the file is in the same directory. But you should provide the full path here if it is in a different directory.
Next we load the graph in the scipy.sparse.csr_matrix
format. Assuming the graph is stored in the MatrixMarket format, we use the scipy.io.mmread
function which returns the graph in the scipy.sparse.coo_matrix
format. We convert it to the csr_matrix
format and store it in the graph
variable. We will be specifying the input graph file name at run time and hence we have used the sys.argv[1]
as input file name.
We then pass this graph to the pagerank_delta program by invoking the set_graph
function we had defined. Finally we invoke the do_pagerank_delta
function and save its return value in the ranks. We also time the call to this function by calling the time.perf_counter()
function before and after the function call and substracting the time.
Save the above python program in a file named pagerank_delta.py
Before we can run the program we have to make sure that environment variable PYTHONPATH
points to GraphIt's build directory.
This can be done using the command -
export PYTHONPATH=<path/to/graphit/build>
We are ready to run this program using python3. We can invoke the command -
python3 pagerank_delta.py <path/to/graph.mtx>
Running the code outputs the time elapsed to run the computation.
The build directory of Graphit is added to your PYTHONPATH already, but just in case the above command gives an error messsage like -
ImportError: No module named 'graphit'
It means that python3 cannot find the graphit
module. You can point to it using the command -
export PYTHONPATH=<path/to/graphit/build/directory>
and then running the command again -
python3 pagerank_delta.py <path/to/graph.mtx>
All the code used in this tutorial is available at the following urls
To reproduce these examples, you can just navigate to the apps directory and run the python command given below. Some example graphs are also provided in the test/graphs directory.
python3 pagerank_delta.py ../test/graphs/4.mtx