Project page: https://github.com/paultcochrane/pbsclusterviz
$ python setup.py install
$ pbsnodes -x > pbsnodes.xml
# assuming your cluster nodes all start with 'lcn'
# and your cluster name is "Cluster"
$ pbs_gen_nodes_file -x pbsnodes.xml -n Cluster -p lcn -o nodes
$ pbs_cluster_status
Installation in a central location:
$ python setup.py install
Installation in your home directory:
$ python setup.py install --prefix=$HOME/pbsclusterviz
Then you need to extend your PYTHONPATH
environment variable so:
$ export PYTHONPATH=$PYTHONPATH:$HOME/pbsclusterviz/lib/python2.x/site-packages
and extend your PATH
so:
$ export PATH=$PATH:$HOME/pbsclusterviz/bin
Dependencies:
-
Python Version 2.4+
$ aptitude install python
or$ yum install python
-
The Visualisation Toolkit Version 5.4+ with Python bindings
$ aptitude install python-vtk
or$ yum install vtk-python
(in the EPEL repository) -
Python XML libraries
$ aptitude install python-libxml2
or$ yum install libxml2-python
-
Python TkInter (for full interactive usage)
$ aptitude install python-tk
or$ yum install tkinter
-
Sphinx (in order to build the html docs)
$ aptitude install python-sphinx
or$ yum install python-sphinx
Firstly, a nodes file needs to be generated. This specifies how the nodes for the entire cluster system will be displayed. The cluster nodes are displayed as boxes laid out in a grid. One essentially has many rows of cluster nodes a set number of nodes wide (e.g. 40 nodes wide would be appropriate for very large installations, the default of 10 nodes should be OK for smaller cluster installations). To get the right numbers one must experiment somewhat.
When the nodes file is initially generated, a basic structure will be created which can be later fine tuned by hand.
$ pbsnodes -x > pbsnodes.xml
$ pbs_gen_nodes_file -x pbsnodes.xml \
-n <node_section_name> -p <node_prefix> -o nodes
Many cluster installations are collections of one or more smaller clusters
of computers, each with their own naming strategy. For instance, a cluster
called "LinuxCluster" could have nodes labeled lcn01
, lcn02
, lcn03
... etc.
Therefore, one needs to specify a node prefix so that pbs_gen_nodes_file
can
pick the relevant nodes out of the pbsnodes xml file. The node section name
is a comment in the generated nodes file.
To account for more than one cluster in an entire cluster system one appends
to an existing nodes file with the -a
option to pbs_gen_nodes_file
.
For example, with three clusters "TinyCluster", "LinuxCluster" and
"BigOldBull", where the nodes are labelled tcn<xx>
, lcn<xx>
and bobn<xx>
respectively, one would run pbs_gen_nodes_file
like so:
$ pbs_gen_nodes_file -x pbsnodes.xml -n TinyCluster -p tcn -o nodes
$ pbs_gen_nodes_file -x pbsnodes.xml -n LinuxCluster -p lcn -o nodes -a
$ pbs_gen_nodes_file -x pbsnodes.xml -n BigOldBull -p bobn -o nodes -a
The output is a plain text file called 'nodes' which you can then alter to your heart's content. If you add a new cluster to your configuration, you merely need to use the line
$ pbs_gen_nodes_file -x pbsnodes.xml -n NewCluster -p newn -o nodes -a
to add the new cluster nodes to your load and job status visualisation.
Now that the nodes file has been generated, you're now ready to begin visualising the load and job status of your cluster system.
To generate an interactive three-dimensional view of the current load of all nodes in your cluster system, you merely need to run the 'pbs_cluster_status' command:
$ pbs_cluster_status -m load
To view current job-level utilisation of all nodes just press the "j" button when viewing the cluster.
If you wish, you can specify a previously generated pbsnodes xml file:
$ pbs_cluster_status -m load -x pbsnodes.xml
The title of the output image is controlled by the configuration file
(default: clusterviz.conf). In the section [load viewer]
you merely need to
set the value of the 'title' key to the title you wish to use. For example:
[load viewer]
title = My awesome cluster load status
To specify an alternate configuration file, you can use the -c
option:
$ pbs_cluster_status -m load -c mycluster.conf
If you wish to use the program non-interactively and thereby generate an
image of the cluster status at that point in time, just use the -N
option.
$ pbs_cluster_status -m load -N
When the program is called, the cluster status image appears briefly on the screen and then disappears. This image is then saved to the file 'cluster_load_status.png' by default.
In order to view the job status of your cluster system you merely need to
use the pbs_cluster_status
command with the -m/--display_mode
option, e.g.:
$ pbs_cluster_status -m job
To change the default title of the generated image, you need to set the
value of the 'title' key in the [job viewer]
section of the configuration
file (clusterviz.conf):
[job viewer]
title = My awesome cluster job status
As before, in order to generate an output image one needs to use the
-N/--non_interactive
option:
$ pbs_cluster_status -m job -N
By default this will generate an image with filename 'cluster_job_status.png'.
The display output can be updated by pressing the u key when in interactive mode.
Instead of just using a local 'pbsnodes.xml' file one can also get this file
from a remote host. This is a practical solution when the computer where
the cluster status is being visualised is not part of PBS-based cluster
system and so is not able to extract the pbsnodes information itself. With
the -s
option to pbs_cluster_status
it is now possible to generate the
'pbsnodes.xml' file on a remote host and have it read at the local host.
Normally this would occur via 'ssh', and after having added a line similar
to the following to the [main]
section of the configuration file
[main]
syscall = ssh login-node 'pbsnodes -x' > pbsnodes.xml
one then merely needs to start pbs_cluster_status
like so:
$ pbs_cluster_status -s
For this to run smoothly (and without you needing to enter your password
each time the display is updated) you should enable password-less ssh
access to the remote server. This article gives a good overview:
http://www.howtoforge.com/ssh-best-practices
The cluster load and job status images are saved to disk with the respective
filenames cluster_load_status.png
and cluster_job_status.png
. An extra file
image file with the current timestamp is also saved for each type of status
image. These files can then be used to create movies of the evolution of
the cluster status over time and can give insight into patterns not
otherwise obvious from viewing the static images. The best way to produce
such movies is to run pbs_cluster_status -m load
and pbs_cluster_status -m job
as cron jobs. For instance, one could save images every ten minutes, then
after a day or even a week, one can generate an mpeg movie file from the
collected images.
In the examples/
directory of the distribution you will find some
pre-generated pbsnodes xml files and an example configuration file.
The first example works for the RRZN cluster system (http://www.rrzn.uni-hannover.de/computeserver.html). Change into the examples/ directory and run the following command:
$ pbs_cluster_status -x pbsnodes_rrzn.xml -n nodes.rrzn -c rrznviz.conf -i
The documentation is distributed with 'pbsclusterviz' in the doc/
directory. Make sure that you have installed the 'sphinx' package so that
you can build the documentation.
To build the html documentation change into the doc/
directory and run
$ make html
then point your browser to .../doc/_build/html/index.html
.
To build the pdf documentation change into the doc/
directory and run
$ make latex
$ cd _build/latex
$ make all-pdf
then open the file PBSClusterViz.pdf
with your favourite PDF-viewer.