check_openmanage - Nagios plugin for checking the hardware status on Dell servers running OpenManage
check_openmanage [OPTION]...
check_openmanage -H hostname [OPTION]...
check_openmanage is a plugin for Nagios which checks the hardware health of Dell servers running OpenManage Server Administrator (OMSA). The plugin checks the health of the storage subsystem, power supplies, memory modules, temperature probes etc., and gives an alert if any of the components are faulty or operate outside normal parameters.
check_openmanage is designed to be used by either locally (using NRPE or similar) or remotely (using SNMP). In either mode, the output is (nearly) the same. Note that checking the alert log is not supported in SNMP mode.
- -f, --config FILE
-
Specify a configuration file. For reference on the config file syntax and options, consult the check_openmanage.conf(5) manual page.
- -t, --timeout SECONDS
-
The number of seconds after which the plugin will abort. Default timeout is 30 seconds if the option is not present.
- -p, --perfdata [multline or minimal]
-
Collect performance data. Performance data collected include temperatures (in Celsius) and fan speeds (in rpm). On systems that support it, power consumption is also collected (in Watts). This option takes one of two arguments, both of which are optional.
If the argument
minimal
is specified, the plugin will use shorter names for the performance data labels, e.g.t0
instead oftemp_0_system_board_ambient
. This can be used as a workaround in cases where the plugin output needs shortening, for example if the 1024 character limit of NRPE is reached.If given the argument
multiline
, the plugin will output the performance data on multiple lines, for Nagios 3.x and above. - --legacy-perfdata
-
With version 3.7.0, performance data output changed. The new format is not compatible with the old format. Users who wish to postpone switching to the new performance data API may set this option.
- -w, --warning STRING or FILE
-
Override the machine-default temperature warning thresholds. Syntax is
id1=max[/min],id2=max[/min],...
. The following example sets warning limits to max 50C for probe 0, and max 45C and min 10C for probe 1:check_openmanage -w 0=50,1=45/10
The minimum limit can be omitted, if desired. Most often, you are only interested in setting the maximum thresholds.
This parameter can be either a string with the limits, or a file containing the limits string. The option can be specified multiple times.
NOTE: This option should only be used to narrow the field of OK temperatures wrt. the OMSA defaults. To expand the field of OK temperatures, increase the OMSA thresholds. See the plugin web page for more information.
- -c, --critical STRING or FILE
-
Override the machine-default temperature critical thresholds. Syntax and behaviour is the same as for warning thresholds described above.
- -F, --fahrenheit
-
Set Fahrenheit as unit for all temperatures. This option will override the
--tempunit
option, if used simultaneously. - --tempunit CHAR
-
Set temperature unit. Legal values are
F
for Fahrenheit,C
for Celsius,K
for Kelvin andR
for Rankine. - -o, --ok-info NUMBER
-
This option lets you define how much output you want the plugin to give when everything is OK, i.e. the verbosity level. The default value is 0 (one line of output). The output levels are cumulative.
- 0
-
- Only one line (default)
- 1
-
- BIOS and firmware info on a separate line
- 2
-
- Storage controller and enclosure info on separate lines
- 3
-
- OMSA version on separate line
The reason that OMSA version is separated from the rest is that finding it requires running a really slow omreport command, when the plugin is run locally via NRPE.
- -B, --show-blacklist
-
If used together with blacklisting, this option will make the plugin output all blacklistings that are being used. The output will have the correct blacklisting syntax, and will make it easy to maintain control over which blacklistings that are used for each server, as any blacklistings can be viewed from Nagios.
When blacklisting is not used, this option has no effect.
- --omreport OMREPORT PATH
-
Specify full path to omreport, if it is not installed in any of the regular places. Usually this option is only needed on Windows, if omreport is not installed on the C: drive.
- -i, --info
-
Prefix any alerts with the service tag.
- -e, --extinfo
-
Display a short summary of system information (model and service tag) in case of an alert.
- -I, --htmlinfo [CODE]
-
Using this option will make the servicetag and model name into clickable HTML links in the output. The model name link will point to the official Dell documentation for that model, while the servicetag link will point to a website containing support info for that particular server.
This option takes an optional argument, which should be your country code or
me
for the middle east. If the country code is omitted the servicetag link will still work, but it will not be speficic for your country or area. Example for Germany:check_openmanage --htmlinfo de
If this option is used together with either the --extinfo or --info options, it is particularly useful. Only the most common country codes is supported at this time.
- --postmsg STRING or FILE
-
User specified post message. Useful for displaying arbitrary or various system information at the end of alerts. The argument is either a string with the message, or a file containing that string. You can control the format with the following interpreted sequences:
- %m
-
System model
- %s
-
Service tag
- %b
-
BIOS version
- %d
-
BIOS release date
- %o
-
Operating system name
- %r
-
Operating system release
- %p
-
Number of physical drives
- %l
-
Number of logical drives
- %n
-
Line break. Will be a regular line break if run from a TTY, else an HTML line break.
- %%
-
A literal
%
- -s, --state
-
Prefix each alert with its corresponding service state (i.e. warning, critical etc.). This is useful in case of several alerts from the same monitored system.
- -S, --short-state
-
Same as the --state option above, except that the state is abbreviated to a single letter (W=warning, C=critical etc.).
- --hide-servicetag
-
This option will replace the servicetag (serial number) in the output with
XXXXXXX
. Use this option to suppress or censor the servicetag in the plugin output. - --linebreak STRING
-
check_openmanage will sometimes report more than one line, e.g. if there are several alerts. If the script has a TTY, it will use regular linebreaks. If not (which is the case with NRPE) it will use HTML linebreaks. Sometimes it can be useful to control what the plugin uses as a line separator, and this option provides that control.
The argument is the exact string to be used as the line separator. There are two exceptions, i.e. two keywords that translates to the following:
- REG
-
Regular linebreaks, i.e. "\n".
- HTML
-
HTML linebreaks, i.e. "<br/>".
This is a rather special option that is normally not needed. The default behaviour should be sufficient for most users.
- -d, --debug
-
Debug output. Will report status on everything, even if status is ok. Blacklisted or unchecked components are ignored (i.e. no output).
NOTE: This option is intended for diagnostics and debugging purposes only. Do not use this option from within Nagios, i.e. in the Nagios config.
- -h, --help
-
Display help text.
- -V, --version
-
Display version info.
- -H, --hostname HOSTNAME
-
The transport address of the destination SNMP device. Using this option triggers SNMP mode.
- -P, --protocol PROTOCOL
-
SNMP protocol version. This option is optional and expects either of the following:
"1" : SNMP version 1 "2c" or "2" : SNMP version 2c "3" : SNMP version 3
The default is
2c
. - -C, --community COMMUNITY
-
This option expects a string that is to be used as the SNMP community name when using SNMP version 1 or 2c. By default the community name is set to
public
if the option is not present. - --port PORT
-
SNMP port of the remote (monitored) system. Defaults to the well-known SNMP port 161.
- -6, --ipv6
-
This option will cause the plugin to use IPv6. The default is IPv4 if the option is not present.
- --tcp
-
This option will cause the plugin to use TCP as transport protocol. The default is UDP if the option is not present.
- -U, --username SECURITYNAME
-
[SNMPv3] The User-based Security Model (USM) used by SNMPv3 requires that a securityName be specified. This option is required when using SNMP version 3, and expects a string 1 to 32 octets in lenght.
- --authpassword PASSWORD, --authkey KEY
-
[SNMPv3] By default a securityLevel of
noAuthNoPriv
is assumed. If the --authpassword option is specified, the securityLevel becomesauthNoPriv
. The --authpassword option expects a string which is at least 1 octet in length as argument.Optionally, instead of the --authpassword option, the --authkey option can be used so that a plain text password does not have to be specified in a script. The --authkey option expects a hexadecimal string produced by localizing the password with the authoritativeEngineID for the specific destination device. The
snmpkey
utility included with the Net::SNMP distribution can be used to create the hexadecimal string (see snmpkey). - --authprotocol ALGORITHM
-
[SNMPv3] Two different hash algorithms are defined by SNMPv3 which can be used by the Security Model for authentication. These algorithms are HMAC-MD5-96
MD5
(RFC 1321) and HMAC-SHA-96SHA-1
(NIST FIPS PUB 180-1). The default algorithm used by the plugin is HMAC-MD5-96. This behavior can be changed by using this option. The option expects either the stringmd5
orsha
to be passed as argument to modify the hash algorithm. - --privpassword PASSWORD, --privkey KEY
-
[SNMPv3] By specifying the options --privkey or --privpassword, the securityLevel associated with the object becomes
authPriv
. According to SNMPv3, privacy requires the use of authentication. Therefore, if either of these two options are present and the --authkey or --authpassword arguments are missing, the creation of the object fails. The --privkey and --privpassword options expect the same input as the --authkey and --authpassword options respectively. - --privprotocol ALGORITHM
-
[SNMPv3] The User-based Security Model described in RFC 3414 defines a single encryption protocol to be used for privacy. This protocol, CBC-DES
DES
(NIST FIPS PUB 46-1), is used by default or if the stringdes
is passed to the --privprotocol option. The Net::SNMP module also supports RFC 3826 which describes the use of CFB128-AES-128AES
(NIST FIPS PUB 197) in the USM. The AES encryption protocol can be selected by passingaes
oraes128
to the --privprotocol option.One of the following arguments are required: des, aes, aes128, 3des, 3desde
- --use-get_table
-
This option exists as a workaround when using check_openmanage with SNMPv3 on Windows with net-snmp. Using this option will make check_openmanage use the Net::SNMP function get_table() instead of get_entries() while fetching values via SNMP. The latter is faster and is the default.
- -b, --blacklist STRING or FILE
-
Blacklist missing and/or failed components, if you do not plan to fix them. The parameter is either the blacklist string, or a file (that may or may not exist) containing the string. The blacklist string contains component names with component IDs separated by slash (/). Blacklisted components are left unchecked.
TIP: Use the option
-d
(or--debug
) to get the blacklist ID for devices. The ID is listed in a separate column in the debug output.NOTE: If blacklisting is in effect, the global health of the system is not checked.
- Syntax:
-
component1=id1[,id2,...]/component2=id1[,id2,...]/...
The ID part can also be
all
, in which all components of that type is blacklisted. - Example:
-
check_openmanage -b ps=0/fan=3,5/pdisk=1:0:0:1/ctrl_driver=all
In the example we blacklist powersupply 0, fans 3 and 5, physical disk 1:0:0:1, and warnings about out-of-date drivers for all controllers. Legal component names include:
- ctrl
-
Storage controller. Note that if a controller is blacklisted, all components on that controller (such as physical and logical drives) are blacklisted as well.
- ctrl_fw
-
Suppress the special warning message about old controller firmware. Use this if you can not or will not upgrade the firmware.
- ctrl_driver
-
Suppress the special warning message about old controller driver. Particularly useful on systems where you can not upgrade the driver.
- ctrl_stdr
-
Suppress the special warning message about old Storport driver on Windows.
- ctrl_pdisk
-
This blacklisting keyword exists as a possible workaround for physical drives with bad firmware which makes Openmanage choke. It takes the controller number as argument. Use this option to blacklist all physical drives on a specific controller. This blacklisting keyword is only available in local mode, i.e. not with SNMP.
- pdisk
-
Physical disk.
- pdisk_cert
-
Suppress warning message about non-certified physical disk.
- pdisk_foreign
-
Suppress warning message about foreign physical disk.
- vdisk
-
Logical drive (virtual disk)
- bat
-
Controller cache battery
- bat_charge
-
Ignore warnings related to the controller cache battery charging cycle, which happens approximately every 40 days on Dell servers. Note that using this blacklist keyword makes check_openmanage ignore non-critical cache battery errors.
- conn
-
Connector (channel)
- encl
-
Enclosure
- encl_fan
-
Enclosure fan
- encl_ps
-
Enclosure power supply
- encl_temp
-
Enclosure temperature probe
- encl_emm
-
Enclosure management module (EMM)
- dimm
-
Memory module
- fan
-
Fan
- ps
-
Powersupply
- temp
-
Temperature sensor
- cpu
-
Processor (CPU)
- volt
-
Voltage probe
- bp
-
System battery
- amp
-
Amperage probe (power consumption monitoring)
- intr
-
Intrusion sensor
- sd
-
SD card
- --no-storage
-
Turn off storage checking. This is an alias for
--check storage=0
. - --only KEYWORD
-
This option can be specifed once and expects a keyword. The different keywords and the behaviour of check_openmanage is described below.
- critical
-
Print only critical alerts. With this option any warning alerts are suppressed.
- warning
-
Print only warning alerts. With this option any critical alerts are suppressed.
- chassis
-
Check all chassis components and nothing else.
- storage
-
Only check storage
- memory
-
Only check memory modules
- fans
-
Only check fans
- power
-
Only check power supplies
- temp
-
Only check temperatures
- cpu
-
Only check processors
- voltage
-
Only check voltage probes
- batteries
-
Only check batteries
- amperage
-
Only check power usage
- intrusion
-
Only check chassis intrusion
- sdcard
-
Only check SD cards
- esmhealth
-
Only check ESM log overall health, i.e. fill grade
- servicetag
-
Only check for sane service tag
- esmlog
-
Only check the event log (ESM) content
- alertlog
-
Only check the alert log content
- --check STRING or FILE
-
This parameter allows you to adjust which components that should be checked at all. This is a rougher approach than blacklisting, which require that you specify component id or index. The parameter should be either a string containing the adjustments, or a file containing the string. No errors are raised if the file does not exist.
Note: This option is ignored with alternate basenames.
- Example:
-
check_openmanage --check storage=0,intrusion=1
Legal values are described below, along with the default value.
- storage
-
Check storage subsystem (controllers, disks etc.). Default: ON
- memory
-
Check memory (dimms). Default: ON
- fans
-
Check chassis fans. Default: ON
- power
-
Check power supplies. Default: ON
- temp
-
Check temperature sensors. Default: ON
- cpu
-
Check CPUs. Default: ON
- voltage
-
Check voltage sensors. Default: ON
- batteries
-
Check system batteries. Default: ON
- amperage
-
Check amperage probes. Default: ON
- intrusion
-
Check chassis intrusion. Default: ON
- sdcard
-
Check SD cards. Default: ON
- esmhealth
-
Check the ESM log health, i.e. fill grade. Default: ON
- servicetag
-
Check that the service tag (serial number) is sane and not empty. Default: ON
- esmlog
-
Check the ESM log content. Default: OFF
- alertlog
-
Check the alert log content. Default: OFF
The option --debug
(or -d
) can be specified to display all monitored components.
If SNMP is requested, the perl module Net::SNMP is required. Otherwise, only a regular perl distribution is required to run the script. On the target (monitored) system, Dell Openmanage Server Administrator (OMSA) must be installed and running.
If no errors are discovered, a value of 0 (OK) is returned. An exit value of 1 (WARNING) signifies one or more non-critical errors, while 2 (CRITICAL) signifies one or more critical errors.
The exit value 3 (UNKNOWN) is reserved for errors within the script, or errors getting values from Dell OMSA.
Written by Trond H. Amundsen <[email protected]>
Storage info is not collected or checked on very old PowerEdge models and/or old OMSA versions, due to limitations in OMSA. The overall support on those models/versions by this plugin is not well tested.
The plugin should work with the Nagios embedded perl interpreter (ePN). However, this is not thoroughly tested.
Report bugs to <[email protected]>
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.
check_openmanage.conf(5) http://folk.uio.no/trondham/software/check_openmanage.html