Server room temperature monitoring using Zenoss and IPMI

16.01.2012.

One of the major risks in the modern server room is the danger of server overheating. This can be caused by a variety of factors, such as fan burnout, blocked air flow from the server, loose CPU cooler connection etc. We have encountered one more likely cause – improper air conditioning in our server room. Besides failure of air-conditioning device, we detected a problem that happens in case of short power outage. Although our UPS happily powered all the servers, the air conditioning devices did not auto start. When such a failure happened over the weekend, Monday morning resulted in the ambient temperature over 40 degrees Celsius.

Our first approach was to look for network aware temperature sensors. Indeed, server equipment manufacturers provide a wide variety of devices to address fire, power, flood and high temperature events. The can have multiple probes, long cables, IP cameras, SNMP, Email, SMS notifications, etc. But such equipment comes with a price tag that can exceed 1000 USD. There is also low cost solutions that usually feature USB port and measurement software for such purpose. In our case, we wanted to find a cost-effective solution that could be obtained quickly from the local distributor.
Most modern servers that have been designed for rack environment feature one or more temperature sensors, mostly to prevent overheating and associated damage to the equipment. The server line that we use comes equipped with several temperature sensors, so we decided to combine this hardware feature with our network monitoring system based on Zenoss. First issue was how to read the sensors. Here came handy the Intelligent Platform Management Interface (IPMI) standard, that has been adopted by wide variety of server equipment manufacturers. Since majority of our servers run Linux operating system, it was straightforward to read the sensors using IPMI tools.

Server side installation

# yum install OpenIPMI OpenIPMI-tools

Depending of the Linux (RHEL) flavour there might be a need to insert kernel modules

# insmod /lib/modules/`uname -r`/kernel/drivers/char/ipmi/ipmi_msghandler.ko
# insmod /lib/modules/`uname -r`/kernel/drivers/char/ipmi/ipmi_watchdog.ko
# insmod /lib/modules/`uname -r`/kernel/drivers/char/ipmi/ipmi_devintf.ko
#insmod /lib/modules/`uname -r`/kernel/drivers/char/ipmi/ipmi_si.ko

Also enable watchdog in the configuration in /etc/sysconfig/ipmi:

IPMI_WATCHDOG=yes
 
# service ipmi start

And now the sensors can be read:

# ipmitool sdr | grep -i temp
Temp             | -51 degrees C     | ok
Temp             | -44 degrees C     | ok
Temp             | 39 degrees C      | ok
Temp             | 37 degrees C      | ok
Ambient Temp     | 37 degrees C      | ok
Ambient Temp     | 36 degrees C      | ok
Temp             | 36 degrees C      | ok
Temp             | 41 degrees C      | ok
Temp             | 37 degrees C      | ok
Ambient Temp     | 29 degrees C      | ok
Planar Temp      | 45 degrees C      | ok
CPU Temp Interf  | Not Readable      | ns
Mem Overtemp     | 0x01              | ok

Enabling remote sensor reading

Next step is to allow remote sensor readings. First it is necessary to edit /etc/ipmi_conf:

addr mgmt_if_ip
priv_limit admin
allowed_auths_callback md5
allowed_auths_user md5
allowed_auths_operator md5
allowed_auths_admin md5
user 2 true "username" "password" user 1 md5

After that, the firewall needs to be updated to allow access to UDP port 623:

# iptables -A OUTPUT -p udp --sport 623 -m state --state NEW,ESTABLISHED -j ACCEPT

The remaining step is to start IPMI deamon:

# ipmilan

Now the network service should be started and available to the monitoring client side. Make sure that firewall change and the startup of ipmilan daemon are made permanent. To test the service, login to the management server, install IPMI packages and run the following command:

# ipmitool -H managed_server -U username -P password sdr | grep -i temp
Temp             | -48 degrees C     | ok
Temp             | -47 degrees C     | ok
Temp             | 39 degrees C      | ok
Temp             | 37 degrees C      | ok
Ambient Temp     | 37 degrees C      | ok
Ambient Temp     | 36 degrees C      | ok
Temp             | 37 degrees C      | ok
Temp             | 42 degrees C      | nc
Temp             | 37 degrees C      | ok
Ambient Temp     | 29 degrees C      | ok
Planar Temp      | 46 degrees C      | ok
CPU Temp Interf  | Not Readable      | ns
Mem Overtemp     | 0x01              | ok

Management server configuration

After verification that the remote sensor reading works as expected, we can proceed with the configuration of Zenoss to monitor the ambiental temperature. If a single sensor reading is of interest, you can open a device, add new monitoring template and add the following command:

echo 'Ambiental temperature |temp='`ipmitool -H <hostname> -U <username> 
-P <password> sdr | grep "^Ambient Temp" | awk '{ print $4 }'`';40;50;0;100'

Values 40, 50, 0 i 100 are: error treshold, critical treshold, min value, max value. This particular format follows NAGIOS standard output, so the same approach can be used in order to monitor the ambiental temperature with any NAGIOS compatible software. A more capable script is available here:

#!/bin/bash
WARNTEMP=40
CRITTEMP=50
MINTEMP=0
MAXTEMP=100
IPMI_HOSTNAME=$1
IPMI_USERNAME=$2
IPMI_PASSWORD=$3
IPMI_VARIABLE=$4
RESULT=`ipmitool -H $IPMI_HOSTNAME -U $IPMI_USERNAME 
-P $IPMI_PASSWORD sdr | grep "$IPMI_VARIABLE" | awk '{ print $4 }'`
MSG_VALUES='|'
INDEX=0
for val in $RESULT
do
   let INDEX=INDEX+1
   if [ $val -lt $WARNTEMP ]; then
      MSG_LEVEL="OK"
   elif [ $val -lt $CRITTEMP ]; then
      MSG_LEVEL="WARN"
   else
      MSG_LEVEL="CRIT"
   fi
   MSG_VALUES="$MSG_VALUES""var$INDEX=$val;$WARNTEMP;$CRITTEMP;$MINTEMP;$MAXTEMP "
done
echo "$IPMI_VARIABLE: $MSG_LEVEL $MSG_VALUES"

With this script it is easy to create graphs of sensor readings, raise alerts and distribute it as any other event notification used for the rest of the network monitoring system. The advantage of the proposed solution is that it requires no additional hardware, and software is limited to standard packages available as part of the operating system installation. Use of ipmilan service also avoids a need to allow remote shell access from the monitoring server to the managed server, thus enhancing security of the system.