XIV and SNMP – lets walk the walk

A new XIV MIB has been posted on the MIB download site for XIV, found here:
http://www-01.ibm.com/support/docview.wss?uid=ssg1S4000913

This reminded me of my plan to blog on how to use SNMP to probe the XIV using an SNMP walk.  In this post I will describe how to do this using a Windows workstation.   The whole point of this exercise is to explore how to use SNMP to confirm the status of the XIV.

Firstly we need to confirm the SNMP community name  set on the XIV has not been changed.   The following is an xcli command that will pull the community name from the XIV.  To run this command you would need to change the IP address and password to match your own machines:

xcli -m 10.10.1.10 -u admin -p adminadmin config_get name=snmp_community

The expected response is:

Name             Value
snmp_community   XIV

My first surprise was that the SNMP community name is XIV by default, not public.   But hey… this is XIV and we love the number 14   #;-)

We now need to install net-snmp onto our Windows workstation.   This gives us a tool to compile the MIB and actually issue SNMP commands.  I downloaded the latest version from here:

http://sourceforge.net/projects/net-snmp/files/net-snmp%20binaries/5.5-binaries/

Having downloaded and installed net-snmp I downloaded the XIV MIB to the following folder:
C:\usr\share\snmp\mibs

You get the MIB from here.

Now we force a re-compile of all MIBS to compile the XIV MIB:

C:\usr\bin>snmptranslate -Dparse-mibs

Look for messages like these.   The module number may be different, depending on how many MIBs already exist in that folder:

XIV-MIB is in C:/usr/share/snmp/mibs/XIV-MIB-10.2.4.txt
Module 72 XIV-MIB is in C:/usr/share/snmp/mibs/xiv-10.2.x-mib.txt
Checking file: C:/usr/share/snmp/mibs/XIV-MIB-10.2.4.txt...
XIV-MIB is now in C:/usr/share/snmp/mibs/XIV-MIB-10.2.4.txt

Now we are ready to walk the SNMP walk starting with the ‘xiv’ OID.   The only thing you need to change in this line is the IP address of the XIV:

C:\usr\bin>snmpwalk -v 2c -c XIV -m XIV-MIB 10.10.1.10 xiv

To understand the command:

-v 2c                      forces SNMP version 2c
-c XIV                   is the community name, it is CASE sensitiVE
-m XIV-MIB       forces the use of the XIV-MIB
10.10.1.10            is the XIV management module IP address
xiv                         is the root of the MIB

The output will go for some time, control-C when you get bored.
Here is an example of the output you will get:

C:\usr\bin>snmpwalk -v 2c -c XIV -m XIV-MIB 10.10.1.10 xiv
XIV-MIB::xivMachineStatus = STRING: "Full Redundancy"
XIV-MIB::xivFailedDisks = INTEGER: 0
XIV-MIB::xivUtilizationSoft = Gauge32: 69
XIV-MIB::xivUtilizationHard = Gauge32: 90
XIV-MIB::xivFreeSpaceSoft = INTEGER: 49770
XIV-MIB::xivFreeSpaceHard = INTEGER: 7954
XIV-MIB::xivIfIOPS.1004 = Gauge32: 6519
XIV-MIB::xivIfIOPS.1005 = Gauge32: 6773
XIV-MIB::xivIfIOPS.1006 = Gauge32: 6515
XIV-MIB::xivIfIOPS.1007 = Gauge32: 6557
XIV-MIB::xivIfIOPS.1008 = Gauge32: 6517
XIV-MIB::xivIfIOPS.1009 = Gauge32: 6575
XIV-MIB::xivIfStatus.1004 = STRING: "OK"
XIV-MIB::xivIfStatus.1005 = STRING: "OK"
XIV-MIB::xivIfStatus.1006 = STRING: "OK"
XIV-MIB::xivIfStatus.1007 = STRING: "OK"
XIV-MIB::xivIfStatus.1008 = STRING: "OK"
XIV-MIB::xivIfStatus.1009 = STRING: "OK"
XIV-MIB::xivEventCode.5 = STRING: START_WORK
XIV-MIB::xivEventCode.9 = STRING: USER_SHUTDOWN

You can cut the output back by specifying just a single field (OID):

snmpwalk -v 2c -c XIV -m XIV-MIB 10.10.1.10 xivFailedDisks
XIV-MIB::xivFailedDisks = INTEGER: 0
snmpwalk -v 2c -c XIV -m XIV-MIB 10.10.1.10 xivFreeSpaceSoft
XIV-MIB::xivFreeSpaceSoft = INTEGER: 17111

What do all these fields mean?   Let me explain:

xivMachineStatus:	Shows if a disk rebuild or redistribution is occurring.
xivFailedDisks:	        The number of failed disks in the XIV.
xivUtilizationSoft:	The percentage of total soft space that is allocated to pools.
xivUtilizationHard:	The percentage of total hard space that is allocated to pools.
xivFreeSpaceSoft:	The amount of soft space that is unallocated in GB.
xivFreeSpaceHard:	The amount of hard space that is unallocated in GB.
xivIfIOPS.1004:	        The number of IOPS being executed by module 4 at that moment.
xivIfIOPS.1005:	        The number of IOPS being executed by module 5 at that moment.
xivIfIOPS.1006:	        The number of IOPS being executed by module 6 at that moment.
xivIfIOPS.1007:	        The number of IOPS being executed by module 7 at that moment.
xivIfIOPS.1008:	        The number of IOPS being executed by module 8 at that moment.
xivIfIOPS.1009:	        The number of IOPS being executed by module 9 at that moment.
xivIfStatus.1004:	The status of module 4 at that moment.
xivIfStatus.1005:	The status of module 5 at that moment.
xivIfStatus.1006:	The status of module 6 at that moment.
xivIfStatus.1007:	The status of module 7 at that moment.
xivIfStatus.1008:	The status of module 8 at that moment.
xivIfStatus.1009:	The status of module 9 at that moment.
xivEventCode.5:	        This is the first event code in the event log.

The example above stops after the first event, but in fact if you leave the walk running, it will eventually list every event in the event log.   This will take a very long time.   It is possible to list the events in the event log by specific event ID and it is also possible to get detailed information for each event.    In the examples below, I pull out information for event ID 1000.   I was not able to get this information with a single command.

C:\>snmpwalk -v 2c -c XIV -m XIV-MIB 10.10.1.10 xivEventCode.1000
XIV-MIB::xivEventCode.1000 = STRING: VOLUME_CREATE
C:\>snmpwalk -v 2c -c XIV -m XIV-MIB 10.10.1.10 xivEventDescription.1000
XIV-MIB::xivEventDescription.1000 = STRING: Volume was created with name 'nestor_103' and size 17GB in Storage Pool with name 'test_pool'.
C:\>snmpwalk -v 2c -c XIV -m XIV-MIB 10.10.1.10 xivEventTime.1000
XIV-MIB::xivEventTime.1000 = STRING: 2011-02-23 20:05:48

If you start the walk without a start point you get something more interesting.  Its interesting because what it tells you is the Linux version being run by the XIV modules and the uptime of the module you are probing.

C:\usr\share\snmp\mibs>snmpwalk -v 2c -c XIV -m XIV-MIB 10.10.1.10
SNMPv2-SMI::mib-2.1.1.0 = STRING: "Linux nextra-WallStCO-module-4 2.6.16.46-268-xiv-220-x86_64-ixss10.2.4 #1 SMP Tue Nov 16 00:43:46 UTC 2010 x86_64"
SNMPv2-SMI::mib-2.1.2.0 = OID: SNMPv2-SMI::enterprises.8072.3.2.10
SNMPv2-SMI::mib-2.1.3.0 = Timeticks: (235647520) 27 days, 6:34:35.20

So now I had two thoughts.   Firstly, could I get this same information using xcli commands?    To get the capacity information we can use the following command:

C:\>xcli -m 10.10.1.10  -u admin -p adminadmin system_capacity_list
Soft   Hard   Max_Pool_Size   Free Hard   Free Soft   Spare Modules   Spare Disks   Target Spare Modules   Target SpareDisks
160013 79113  79113           7954        49770       1               3             1                      3

In the output above we can see the Free Soft matches the xivFreeSpaceSoft output from the SNMP walk command, while the Free Hard value matches the xivFreeSpaceHard output from the SNMP walk command.    But this information is only useful is you want to confirm how much space is NOT allocated to a pool.   If you allocated all usable space to your pools on day one, then this command will simply confirm that there is no free space available OUTSIDE your existing pools.   There may well be lots of free space WITHIN your pools.   To examine free space within your pools is a different art, and I will explore that in a future post.

To see if I have any failed disks or modules, I could use the following commands.  In this example there are no failed disks or modules (from the component_list command) and that the machine is fully redundant (from the state_list command).  Fully redundant means every block of data has two copies (it does not mean that every component is working).

C:\>xcli -m 10.10.1.10 -u admin -p adminadmin component_list filter=notok
No components match the given criteria
C:\>xcli -m 10.10.1.10 -u admin -p adminadmin state_list
Category          Value
system_state      on
target_state      on
safe_mode         no
shutdown_reason   No
Shutdownoff_type  off
redundancy_status Full Redundancy

So far so good.  I have a way using both SNMP and xCLI, to get usage information and component availability information.  I can script this to give me a way to confirm the health of my XIV.

My next challenge is to setup Nagios and IBM Director to monitor an XIV using SNMP.  I am working with a client right now to achieve this.    I would love to hear examples of people who are also doing this.

Advertisements

About Anthony Vandewerdt

I am an IT Professional who lives and works in Melbourne Australia. This blog is totally my own work. It does not represent the views of any corporation. Constructive and useful comments are very very welcome.
This entry was posted in Uncategorized. Bookmark the permalink.

8 Responses to XIV and SNMP – lets walk the walk

  1. Steven Lennon says:

    I am assisting one of my Delivery teams with monitoring using the WhatsUp Gold tool set. They have an XIV and the Storage Engineer uses the command line and GUI tools for looking at status and performance information regarding the XIV. These tools seem to be quite capable being specifically designed for the XIV, however I was hoping we could augment these day to day administration and fault finding capabilities by incorporating the XIV MIB into WhatsUp Gold. This would give the rest of the support team visibility on XIV status and correlation with connected systems.

    Like your challenge with Nagios and IBM director, we are trying to do the same thing with WhatsUp Gold and IBM Director to monitor the XIV using SNMP, however I am wondering whether we will achieve any additional benefits over the existing XIV tools by doing this, and exactly what we should be monitoring through SNMP vs Command line\GUI. Also what sorts of SNMP traps should we want to trigger notifications on vs just informational traps.

    One thing I have noticed is that IBM Support seems to get notified of things such as disk errors (ie. A potential disk failure) that we cannot see through the command line or GUI. These seems to be reported at a lower level and I am wondering whether the SNMP interface will give the team better visibility to this so our support also has early notification of such events.

    • avandewerdt says:

      The challenge with using SNMP traps is that there are 100s of different events, so the decision process on event handling can become complex.
      The other issue which you have already noted, is that some events are marked as ‘internal’ meaning they do not appear in the event log even, though they are sent to IBM. This leads to some confusion when IBM contacts the client to inform them of an issue that the client did not get alerted for. Disk failures however should not be one of those.
      I do not believe SNMP walks are a sufficient tool since they would not reveal failed power supplies.
      This means you need to either also monitor traps or event emails, or use the component_list’ command I suggested, as that is a better tool for determining if there is a failed piece of hardware.

  2. Ramesh says:

    What kind of SNMP traps can be send by IBM System Storage DS4300?

    I have IBM System Storage DS4300, I want to monitor that system using Tivoli Monitoring Agent. In order to perform monitoring, i have MIB (which is provided
    with DS4000/DS5000 TotalStorage Manager) file which can be uploaded to the Tivoli Monitoring Agent, but after uploading that file, MIB is not collecting any data.
    Now my question is, what kind of SNMP traps can DS4300 send?

    • So we need to go back to basics.
      Are the traps getting out of the DS4300? Confirm with network sniffer. What I do is direct connect to the device and set my laptop as destination, then do a test Trap. If that works… then do the same with your intended trap destination. There are several freeware trap receivers you can test with.
      Use Wireshark for sniffing.

      Are the traps arriving at Tivoli? Confirm with network sniffer.
      Is MIB compiled?
      Are traps being recognized as using that MIB?

  3. llagos says:

    The URL doesn’t work anymore…???

  4. GunBoat says:

    You can generate the MIB directly from the XIV -> xcli.py mib_get

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s