1. Introduction

In a well organized Data Center and infrastructure, monitoring is crucial to guarantee the continuity of mission critical applications, services and devices.

One way of monitoring is the use of SNMP, which stands for Simple Network Managent Tools, with the emphasis on the first word: Simple.

This method of monitoring is nothing new, it has been implemented by almost all vendors since many many years, well understood and integraded in many software solutions OR not at all understood or wrongly integrated.

2. Our solution: RSCALA

We wrote a script called RSCALA which stands for REXX Scala, where scala is the Latin word for stairs in Dutch.

This script will be triggered by the snmptrapd daemon bundled with the net-snmp package shipped by most Linux distributions.

The main tasks of this scripts are:

  • read and interprete the raw information send by the snmptrapd daemon

  • log the analyzed information to a human readable format

  • write to a database server, such as mySQL (optionaly)

  • send a mail message (optionaly)

3. How RSCALA works

RSCALA will be triggered by the snmptrapd daemon.

4. installing RSCALA

yum install net-snmp
vi /etc/snmp/snmptrapd.conf
#
trapHandle default /bin/rscala
disableAuthorization yes
doNotLogTraps no
doNotRetainNotificationLogs no
logOption f /var/log/snmp/snmptrapd.log
wget -N -nv http://d01cid.ddns.net/sharel/img/rscala/rscala -P /bin/
vi /etc/rscala.conf
#
# config file used by rscala (c) alain rykaert - 01jul2009-12jan2010-08may2011-22nov2011
#

[MAIL]
 SendMail = "0"                                     /* 1=on, 0=off*/
 MailAddress = "traps@cid.net"                   /* e-mail address*/

[LOG]
 WhiteLogFile = /var/log/snmp/rscala.log         /* white log file*/
 BlackLogFile = /var/log/snmp/rscala.xxx         /* black log file*/

[MYSQL]
 SendmySQL = "0"                                    /* 1=on, 0=off*/
 mySQLServer = "localhost"                     /* mysql servername*/
 mySQLUserid = "root"                              /* mysql userid*/
 mySQLPasswd = "4627037737371605"                /* mysql password*/
 mySQLdbName = "rscala"                     /* mysql database name*/
 autoAck = "1"                          /* auto acknowledge option*/

[SOUND]
 Beep = 1                                           /* 1=on, 0=off*/
 BeepExec = /usr/bin/beep                          /* beep program*/
 BlackBeep = 100, 1                         /* frequence and count*/
 WhiteBeep = 3000, 1                        /* frequence and count*/

[DEBUG]
 Debug = 1

[BLACKHOSTS]
 plx00013
 srv133

[BLACKMESSAGES]
 "logon at terminal"
 "Remote Login Successful"
 "Remote login successful"
 "Remote logoff successful"
 "windows"
 "boes"
 "GigabitEthernet2/0/1"
 "GigabitEthernet2/0/2"
 "GigabitEthernet2/0/3"
 "GigabitEthernet2/0/4"
 "GigabitEthernet2/0/5"

[MIB]
# trapgen
  1824.1.0.0.1

# dummy
  1.2.3.4 = "de slimste mens" /* this is a demo */

# printserver
  11.2.3.9.1

# bladeservers
;  2.6.158.3.1.1.3
  2.6.158.3.1.1.18
  2.6.158.3.1.1.5
  2.6.158.3.1.1.11
  2.6.158.3.1.1.15
  2.6.158.3.1.1.8

# x3650 IMM
  2.6.158.5.1.9

# ds3xx ds4xx
  1123.4.300.1.1.6         /* version 9*/
  1123.4.500.1.1.7         /* version 10*/
  789.1123.1.500.1.1.7

# san switch qlogic
  94.1.11.1.9.16.0.0.5.30.2.78.63.0.0.0.0.0.0.0.0.1029
  94.1.11.1.9.16.0.0.192.221.13.141.133.0.0.0.0.0.0.0.0.1
  94.1.11.1.9.16.0.0.192.221.13.143.182.0.0.0.0.0.0.0.0.1

# digital china dcs-3950
  1.3.6.1.2.1.2.2.1.1

# cisco 3750
  9.9.41.1.2.3.1.5.*

# cisco lan switch in blade
  9.9.41.1.2.3.1.5.88
  9.9.41.1.2.3.1.5.89

# cisco ap1200
; 9.2.2.1.1.20.1
; 9.9.41.1.2.3.1.4.3
; 9.9.41.1.2.3.1.5.3
; 9.9.41.1.2.3.1.4.5
; 9.9.41.1.2.3.1.5.5

# rsa (westcom)
# 2197.20.16.6.0 /* applications traps */
  674.10892.1.5000.10.3.0

# ts3310               # http://community-downloads.quest.com/management-extensions/htm/Quantum_description.htm
  3764.1.10.10.3.0.101 = "Startup Sequence Completed"
  3764.1.10.10.3.0.102 = "Shutdown Sequence Initiated"
  3764.1.10.10.3.0.103 = "Change in Online state of the Physical Library"
  3764.1.10.10.3.0.104 = "Change in main Chassis door status"
  3764.1.10.10.3.0.105 = "Change in IE door status"
  3764.1.10.10.3.0.106 = "Robotics changed state to ready"
  3764.1.10.10.3.0.107 = "Robotics changed state to not ready"
  3764.1.10.10.3.0.108 = "Partition changed online state"
  3764.1.10.10.3.0.109 = "Partition changed online state"
  3764.1.10.10.3.0.110 = "RAS status of the Control SubSystem Changed"
  3764.1.10.10.3.0.111 = "RAS status of the Cooling SubSystem Changed"
  3764.1.10.10.3.0.112 = "RAS status of the Drive SubSystem Changed"
  3764.1.10.10.3.0.113 = "RAS status of the Media SubSystem Changed"
  3764.1.10.10.3.0.114 = "RAS status of the Power SubSystem Changed"
  3764.1.10.10.3.0.115 = "RAS status of the Robotics SubSystem Changed"
  3764.1.10.10.3.0.116 = "Operator intervention is required"
  3764.1.10.10.3.0.117 = "Drive changed online state"

# ts3500
#  2.6.182.1.2.11.1
  2.6.182.1.2.71.1

# gpfs
  2.6.212.2.1
  2.6.212.1.6.1.1
# 2.6.212.*

# ups                 # http://community-downloads.quest.com/management-extensions/htm/Powerware_description.htm
  534.1.11.4.1.0.0.1  = "534.1.11.4.1.0.0.1"
  534.1.11.4.1.0.0.2  = "534.1.11.4.1.0.0.2"
  534.1.11.4.1.0.0.3  = "534.1.11.4.1.0.0.3"
  534.1.11.4.1.0.0.4  = "534.1.11.4.1.0.0.4"
  534.1.11.4.1.0.0.5  = "534.1.11.4.1.0.0.5"
  534.1.11.4.1.0.0.6  = "534.1.11.4.1.0.0.6"
  534.1.11.4.1.0.0.7  = "534.1.11.4.1.0.0.7"
  534.1.11.4.1.0.0.8  = "534.1.11.4.1.0.0.8"
  534.1.11.4.1.0.0.9  = "534.1.11.4.1.0.0.9"
  534.1.11.4.1.0.0.10 = "534.1.11.4.1.0.0.10"
  534.1.11.4.1.0.0.11 = "534.1.11.4.1.0.0.11"
  534.1.11.4.1.0.0.12 = "534.1.11.4.1.0.0.12"
  534.1.11.4.1.0.0.13 = "534.1.11.4.1.0.0.13"
  534.1.11.4.1.0.0.14 = "534.1.11.4.1.0.0.14"
  534.1.11.4.1.0.0.15 = "534.1.11.4.1.0.0.15"

# sonas
  2.6.219.2.2.8
systemctl restart snmptrapd.service

5. sample

cat /var/log/snmp/rscala.log
09Mar2010 12:29:00 . amm200       - 100.0.0.200  : 8886E1G # YK17808B615G # srv100 # Blade_05 # Blade powered on
09Mar2010 12:28:24 . lan205       - 100.0.0.205  : Interface GigabitEthernet0/11, changed state to down
16Mar2010 22:24:47 . acp032       - 10.2.1.32    : Packet to client 001e.a99a.3942 reached max retries, removing the client
08Jun2010 15:08:22 . prt001       - 10.2.10.2    : Port 3 Paper Out
07Feb2011 16:21:48 . imm4         - 10.0.64.47   : Management Controller Test Alert Generated by USERID
22Feb2011 19:11:42 . amm01        - 10.2.1.233   : 88861MG # 99A0109 # SERVPROC # Event log 75% full
24Oct2011 14:40:56 . san202       - 10.2.1.202   : Enclosure 85, Slot 1 # Drive in array or hot spare in use removed
02Dec2011 11:47:18 . tap091       - 10.1.3.91    : The library has been manually turned offline and is unavailable for use.
02Dec2011 11:47:28 . tap091       - 10.1.3.91    : All library doors are closed.  The library will inventory and resume operations
02Dec2011 11:49:44 . tap091       - 10.1.3.91    : A library configuration setting has been changed.
10Jan2012 16:27:10 . imm222       - 10.2.1.222   : Power Supply 1 has lost input
10Jan2012 16:27:11 . imm222       - 10.2.1.222   : Non-redundant:Insufficient Resources for Power Group 1 has asserted
10Jan2012 16:27:21 . imm222       - 10.2.1.222   : Power Supply 1 has returned to a Normal Input State
10Jan2012 16:27:22 . imm222       - 10.2.1.222   : Non-redundant:Sufficient Resources from Redundancy Degraded
10Jan2012 16:27:29 . imm222       - 10.2.1.222   : Non-redundant:Insufficient Resources for Power Group 1 has deasserted
22Jan2012 18:53:47 . srv201       - 10.2.1.201   : nsNotifyShutdown
22Jan2012 19:33:30 . srv082       - 10.2.1.82    : nsNotifyShutdown
27Jan2012 19:47:42 . prt002       - 10.2.10.2    : Port 3 Paper Out

6. mysql

curl -s http://d01cid.ddns.net/sharel/bin/inst-mariadb-server | sh
mysql -u'root' -p'Passw0rd' -e "create database rscala;"
vi rscala.sql
alter database character set utf8;
set names 'utf8';

CREATE TABLE `history` (
  `date`      char(20)  default NULL,
  `time`      char(20)  default NULL,
  `status`    int(1)    default NULL,
  `hostname`  char(15)  default NULL,
  `ipaddress` char(15)  default NULL,
  `message`   char(255) default NULL,
  `id`        char(20) default NULL,
  PRIMARY KEY  (`id`)
)
mysql -u'root' -p'Passw0rd' rscala < rscala.sql
mysql -u'root' -p'Passw0rd' rscala -e "select * from history;"

7. cases

Following are samples from traps received and captured from the /dev/shm directory.

trap received from a BladeCenter Advanced Management Module
amm01.cid.net
UDP: [10.2.1.233]:161
DISMAN-EVENT-MIB::sysUpTimeInstance 0:0:07:05.43
SNMPv2-MIB::snmpTrapOID.0 SNMPv2-SMI::enterprises.2.6.158.3.0.35
SNMPv2-SMI::enterprises.2.6.158.3.1.1.1 "Date(m/d/y)=09/07/09, Time(h:m:s)=13:56:01"
SNMPv2-SMI::enterprises.2.6.158.3.1.1.2 "BladeCenter Advanced Management Module"
SNMPv2-SMI::enterprises.2.6.158.3.1.1.3 "amm01"
SNMPv2-SMI::enterprises.2.6.158.3.1.1.4 "350514C49EDE11DC8D5100145EE02430"
SNMPv2-SMI::enterprises.2.6.158.3.1.1.5 "99A0109"
SNMPv2-SMI::enterprises.2.6.158.3.1.1.6 35
SNMPv2-SMI::enterprises.2.6.158.3.1.1.7 4
SNMPv2-SMI::enterprises.2.6.158.3.1.1.8 "Event log 75% full"
SNMPv2-SMI::enterprises.2.6.158.3.1.1.9 "admin"
SNMPv2-SMI::enterprises.2.6.158.3.1.1.10 "forum rubens"
SNMPv2-SMI::enterprises.2.6.158.3.1.1.11 ""
SNMPv2-SMI::enterprises.2.6.158.3.1.1.12 ""
SNMPv2-SMI::enterprises.2.6.158.3.1.1.13 ""
SNMPv2-SMI::enterprises.2.6.158.3.1.1.14 113
SNMPv2-SMI::enterprises.2.6.158.3.1.1.15 "SERVPROC"
SNMPv2-SMI::enterprises.2.6.158.3.1.1.16 0
SNMPv2-SMI::enterprises.2.6.158.3.1.1.17 "10.2.1.233"
SNMPv2-SMI::enterprises.2.6.158.3.1.1.18 "88861MG"
SNMPv2-SMI::enterprises.2.6.158.3.1.1.19 ""
SNMP-COMMUNITY-MIB::snmpTrapAddress.0 10.2.1.233
SNMP-COMMUNITY-MIB::snmpTrapCommunity.0 "public"
SNMPv2-MIB::snmpTrapEnterprise.0 SNMPv2-SMI::enterprises.2.6.158.3
trap received from a Cisco switch 3750
lan6
UDP: [10.2.1.2]:54599
DISMAN-EVENT-MIB::sysUpTimeInstance 0:0:31:06.05
SNMPv2-MIB::snmpTrapOID.0 SNMPv2-SMI::enterprises.9.9.41.2.0.1
SNMPv2-SMI::enterprises.9.9.41.1.2.3.1.2.22 "LINK"
SNMPv2-SMI::enterprises.9.9.41.1.2.3.1.3.22 4
SNMPv2-SMI::enterprises.9.9.41.1.2.3.1.4.22 "UPDOWN"
SNMPv2-SMI::enterprises.9.9.41.1.2.3.1.5.22 "Interface GigabitEthernet1/0/1, changed state to up"
SNMPv2-SMI::enterprises.9.9.41.1.2.3.1.6.22 0:0:31:06.05
trap received from a Cisco switch in a BladeCenter chassis
lan7.sms.local
UDP: [10.17.1.7]:61681
DISMAN-EVENT-MIB::sysUpTimeInstance 119:3:51:53.12
SNMPv2-MIB::snmpTrapOID.0 SNMPv2-SMI::enterprises.9.9.41.2.0.1
SNMPv2-SMI::enterprises.9.9.41.1.2.3.1.2.176 "PLATFORM_ENV"
SNMPv2-SMI::enterprises.9.9.41.1.2.3.1.3.176 2
SNMPv2-SMI::enterprises.9.9.41.1.2.3.1.4.176 "TEMP"
SNMPv2-SMI::enterprises.9.9.41.1.2.3.1.5.176 "Abnormal temperature detected"
SNMPv2-SMI::enterprises.9.9.41.1.2.3.1.6.176 119:3:51:53.12
trap received from a SMClient to a DS4800
dsmon01.sms.local
UDP: [172.18.40.23]:50039
DISMAN-EVENT-MIB::sysUpTimeInstance 11:21:59:29.45
SNMPv2-MIB::snmpTrapOID.0 SNMPv2-SMI::enterprises.1123.4.500.0.2
SNMPv2-SMI::enterprises.1123.4.500.1.1.1 0
SNMPv2-SMI::enterprises.1123.4.500.1.1.2 "                                        "
SNMPv2-SMI::enterprises.1123.4.500.1.1.3 "                  "
SNMPv2-SMI::enterprises.1123.4.500.1.1.4 "DS4800-Acerta                 "
SNMPv2-SMI::enterprises.1123.4.500.1.1.5 "151a                "
SNMPv2-SMI::enterprises.1123.4.500.1.1.6 "Feb 17, 2010 12:46:21 PM                "
SNMPv2-SMI::enterprises.1123.4.500.1.1.7 "Optical link speed detection failure                                  "
SNMPv2-SMI::enterprises.1123.4.500.1.1.8 "Channel                                                     "
SNMPv2-SMI::enterprises.1123.4.500.1.1.9 "Drive-side: channel 3                   "
trap received from a tape library TS3764
tape226
UDP: [10.17.1.226]:1027
DISMAN-EVENT-MIB::sysUpTimeInstance 0:0:38:18.34
SNMPv2-MIB::snmpTrapOID.0 SNMPv2-SMI::enterprises.3764.1.10.10.3.0.105
SNMPv2-SMI::enterprises.3764.1.10.10.1.5.0 "131"
SNMPv2-SMI::enterprises.3764.1.10.10.14.3.0 1
SNMP-COMMUNITY-MIB::snmpTrapAddress.0 10.17.1.226
SNMP-COMMUNITY-MIB::snmpTrapCommunity.0 "public"
SNMPv2-MIB::snmpTrapEnterprise.0 SNMPv2-SMI::enterprises.3764.1.10.10.3

8. testing rscala with trapgen

on CentOS7

yum -y install compat-libstdc++-33.i686
curl -s http://d01cid.ddns.net/sharel/img/trapgen/trapgen.tgz | tar xzvP -C /
trapgen -d 10.1.1.121 -v "1824.1.0.0.1" STRING "hello $RANDOM"