Skip to content


morrone edited this page Aug 1, 2011 · 7 revisions

Installation has been simplified in LMT version 3. It can be done in three stages.

Quick Install Instructions

To get ltop working

  • Install cerebro and lmt-server-agent on Lustre servers (OSS, MDS)
  • Install cerebro and lmt-server on management node
  • Restart cerebrod on Lustre servers.
  • Run ltop (included in lmt-server) on management node.

To get MySQL working on management node

  • Install mysql-server package
  • Run mysql_secure_installation or equiv, then msyql -p /usr/share/lmt/mkusers.sql as root, after customizing for your site.
  • Set up /etc/lmt/lmt.conf
  • Create databases for each file system to be monitored with lmtinit -a fsname.
  • Restart cerebrod.

To get lwatch (GUI) working

  • Add cron job for aggregation scripts on management node.
  • Install additional packages:
  • lmt-gui and java-1.5.0-ibm (or other JRE package) on desktop or management node.
  • Set up ~/.lmtrc on desktop or management node and run lwatch.

NOTE: The lmt-gui package is available in a separate repository,

Detailed Installation Instructions

Get ltop Working

The lmt-server-agent package contains Cerebro plugins that periodically read Lustre /proc values, convert them to strings, and push the strings into the Cerebro monitoring network. Install this package on Lustre servers, including all MDS, OSS, and optionally LNET routers, and test that LMT is correctly parsing /proc. Pick an OSS and run this on it (output edited for readability):

    # /usr/sbin/lmtmetric -m ost
    ost: 2;tycho1;0.137269;9.561285;
    lc1-OST0000;112367742;113304682;449470968;1929120176;6344724;4294968016;0;137;0;0;0;694;245;COMPLETE 138/138 0s remaining;
    lc1-OST0008;114812669;115748383;459250676;1929120176;6344724;4294968016;0;137;0;0;0;694;262;COMPLETE 138/138 0s remaining;
    lc1-OST0010;113690420;114627066;454761680;1929120176;6344724;4294968016;0;137;0;0;0;694;240;COMPLETE 138/138 0s remaining;

Run this on your MDS (output edited for readability):

    # /usr/sbin/lmtmetric -m osc
    osc: 1;tycho-mds1;lc1-OST0000;F;lc1-OST0001;F;lc1-OST0002;F;lc1-OST0003;F;lc1-OST0004;F;lc1-OST0005;F;lc1-OST0006;F;...
    # /usr/sbin/lmtmetric -m mdt
    mdt: 1;tycho-mds1;0.015762;1.269578;

If there are any errors, open a bug in the LMT issue tracker.

Now restart Cerebro on your Lustre servers:

    # /sbin/service cerebrod restart

Verify that cerebrod is still running on your Lustre servers. Since LMT plugins are shared libraries running in cerebrod's address space, a segfault for example could crash cerebrod. If this occurs, open a bug in the LMT issue tracker.

LMT data should now be present on the Cerebro monitoring network. Install the lmt-server package on your LMT server node and verify that you can see live data with ltop:

    $ /usr/bin/ltop
    Filesystem: lc1
        Inodes:    443.956m total,     49.295m used ( 11%),    394.662m free
         Space:    172.188t total,    129.573t used ( 75%),     42.615t free
       Bytes/s:      0.000g read,       0.000g write,                 0 IOPS
       MDops/s:      0 open,        0 close,       0 getattr,       0 setattr
                     0 link,        0 unlink,      0 mkdir,         0 rmdir
                     0 statfs,      0 rename,      0 getxattr
     OST S        OSS   Exp   CR rMB/s wMB/s  IOPS   LOCKS  LGR  LCR %cpu %mem %spc 
    0000 F     tycho1   137    0     0     0     0       0    0    0    1   10   77
    0001 F     tycho2   137    0     0     0     0       0    0    0    0    9   76
    0002 F     tycho3   137    0     0     0     0       0    0    0    0    9   76
    0003 F     tycho4   137    0     0     0     0       0    0    0    0   10   76
    0004 F     tycho5   137    0     0     0     0       0    0    0    1    9   76
    0005 F     tycho6   137    0     0     0     0       0    0    0    0   10   76
    0006 F     tycho7   137    0     0     0     0       0    0    0    0    9   76
    0007 F     tycho8   137    0     0     0     0       0    0    0    0    9   76
    0008 F     tycho1   137    0     0     0     0       0    0    0    1   10   76
    0009 F     tycho2   137    0     0     0     0       0    0    0    0    9   76
    000a F     tycho3   137    0     0     0     0       0    0    0    0    9   76
    000b F     tycho4   137    0     0     0     0       0    0    0    0   10   77
    000c F     tycho5   137    0     0     0     0       0    0    0    1    9   75
    000d F     tycho6   137    0     0     0     0       0    0    0    0   10   76
    000e F     tycho7   137    0     0     0     0       0    0    0    0    9   76
    000f F     tycho8   137    0     0     0     0       0    0    0    0    9   74

If you don't, you may need to debug your Cerebro configuration. A few things to check are

  • Does cerebro-stat -l show _lmt_ prefixed metrics?
  • Does cerebro-stat -m metricname show live data?
  • Are you speaking/listening on the correct interfaces?
  • Can the network you are using pass multicast traffic?

Visit the Cerebro sourceforge project site for further cerebro info.

Get MySQL Working

Install and configure the MySQL server on your LMT server node. Users starting from scratch with MySQL will want to do something like this:

    # /sbin/service mysqld start
    # msyql_secure_installation

The database will need two LMT users, one for read-write access to the database, and one for read-only access. Use /usr/share/lmt/mkusers.sql as a template, running as a privileged mysql user:

    # Example script for creating LMT MySQL users.
    CREATE USER 'lwatchclient'@'localhost';
    GRANT SHOW DATABASES        ON *.* TO 'lwatchclient'@'localhost';
    GRANT SELECT                ON *.* TO 'lwatchclient'@'localhost';
    CREATE USER 'lwatchadmin'@'localhost' IDENTIFIED BY 'mypass';
    GRANT SHOW DATABASES        ON *.* TO 'lwatchadmin'@'localhost';
    GRANT SELECT,INSERT,DELETE  ON *.* TO 'lwatchadmin'@'localhost';
    GRANT CREATE,DROP           ON *.* TO 'lwatchadmin'@'localhost';

Next, configure /etc/lmt/lmt.conf with the usernames and passwords you just added to the database:

    lmt_db_host = nil
    lmt_db_port = 0
    lmt_db_rouser = "lwatchclient"
    lmt_db_ropasswd = nil
    lmt_db_rwuser = "lwatchadmin"
    lmt_db_rwpasswd = "mypass"

Alternatively you can put the password (all by itself) in /etc/lmt/rwpasswd, make that file readable only by root, and use the following line in place of the lmt_db_rwuser line in your lmt.conf:

    f ="/etc/lmt/rwpasswd")
    if (f) then
      lmt_db_rwpasswd = f:read("*l")
      lmt_db_rwpasswd = nil

This restricts write access to the database to only the root user. Next, add a database for each file system you wish to monitor. For example if your file system is named test:

    # /usr/sbin/lmtinit -a test

lmtinit will use the read-write account from lmt.conf when adding or deleting databases. You can list the file systems that have databases with:

    # /usr/sbin/lmtinit -l

lmtinit will use the read-only account from lmt.conf when listing databases.

Restart Cerebro on the lmt-server node so the LMT Cerebro plugin will pick up the new database info:

    # /sbin/service cerebrod restart

To see if data is being added to your database, run the lmtsh utility a couple of times and watch the tables that end in _DATA. The row count should be increasing:

    # /usr/sbin/lmtsh -f test
    test> t
    Available tables for test:
                                Table Name   Row Count
                                EVENT_DATA   0
                                EVENT_INFO   0
                  FILESYSTEM_AGGREGATE_DAY   225
                 FILESYSTEM_AGGREGATE_HOUR   4977
                 FILESYSTEM_AGGREGATE_WEEK   45
                 FILESYSTEM_AGGREGATE_YEAR   9
                           FILESYSTEM_INFO   1
                         MDS_AGGREGATE_DAY   125
                        MDS_AGGREGATE_HOUR   2770
                       MDS_AGGREGATE_MONTH   10
                        MDS_AGGREGATE_WEEK   25
                        MDS_AGGREGATE_YEAR   5
                                  MDS_DATA   398042
                                  MDS_INFO   1
                              MDS_OPS_DATA   31270065
                         MDS_VARIABLE_INFO   7
                            OPERATION_INFO   81
                                  OSS_DATA   1589675
                                  OSS_INFO   4
                        OSS_INTERFACE_DATA   0
                        OSS_INTERFACE_INFO   0
                         OSS_VARIABLE_INFO   7
                         OST_AGGREGATE_DAY   10800
                        OST_AGGREGATE_HOUR   238896
                       OST_AGGREGATE_MONTH   864
                        OST_AGGREGATE_WEEK   2160
                        OST_AGGREGATE_YEAR   432
                                  OST_DATA   19074487
                                  OST_INFO   48
                              OST_OPS_DATA   0
                         OST_VARIABLE_INFO   11
                      ROUTER_AGGREGATE_DAY   3
                     ROUTER_AGGREGATE_HOUR   3
                    ROUTER_AGGREGATE_MONTH   3
                     ROUTER_AGGREGATE_WEEK   3
                     ROUTER_AGGREGATE_YEAR   3
                               ROUTER_DATA   428097
                               ROUTER_INFO   1
                      ROUTER_VARIABLE_INFO   3
                            TIMESTAMP_INFO   470209
                                   VERSION   0

Get lwatch (GUI) Working

NOTE: The lwatch program is part of the lmt-gui package, the source of which is found in the repository

lwatch require an optimization in the database to make queries over long time periods complete quickly. Data that is gathered at 5s intervals is aggregated and inserted into coarse grained versions of the same tables.

Set up the following cron job to run as root on your LMT server machine:

    20 * * * *      /usr/share/lmt/cron/lmt_agg.cron

After it has run, you can use lmtsh as above to verify that the tables with names containing _AGGREGATE_ are getting populated.

The lmt-gui package contains the lwatch java client that can be used to view live data similar to ltop, only viewed through the database, or to graph historical LMT data from the database. It also contains the lstat text client which can be used for testing. You may wish to install this package on your desktop machine and configure the MySQL server to allow the read-only LMT user to access the database remotely. It can also be installed on the LMT server. Install the package and its prerequisites (a java runtime - it seems to work best with IBM's).

lwatch and lstat read a .lmtrc file in your home directory. This file should contain a stanza for each file system you wish to monitor. Each stanza includes the read-only database account information and the name of the file system, e.g.:

You should now be able to start lwatch and explore its GUI. Double-click on a value to open the dialog for graphing its historical values.

Clone this wiki locally