A new XIV MIB has been posted on the MIB download site for XIV, found here:
http://www-01.ibm.com/support/docview.wss?uid=ssg1S4000913
This reminded me of my plan to blog on how to use SNMP to probe the XIV using an SNMP walk. In this post I will describe how to do this using a Windows workstation. The whole point of this exercise is to explore how to use SNMP to confirm the status of the XIV.
Firstly we need to confirm the SNMP community name set on the XIV has not been changed. The following is an xcli command that will pull the community name from the XIV. To run this command you would need to change the IP address and password to match your own machines:
xcli -m 10.10.1.10 -u admin -p adminadmin config_get name=snmp_community
The expected response is:
Name Value snmp_community XIV
My first surprise was that the SNMP community name is XIV by default, not public. But hey… this is XIV and we love the number 14 #;-)
We now need to install net-snmp onto our Windows workstation. This gives us a tool to compile the MIB and actually issue SNMP commands. I downloaded the latest version from here:
http://sourceforge.net/projects/net-snmp/files/net-snmp%20binaries/5.5-binaries/
Having downloaded and installed net-snmp I downloaded the XIV MIB to the following folder:
C:\usr\share\snmp\mibs
You get the MIB from here.
Now we force a re-compile of all MIBS to compile the XIV MIB:
C:\usr\bin>snmptranslate -Dparse-mibs
Look for messages like these. The module number may be different, depending on how many MIBs already exist in that folder:
XIV-MIB is in C:/usr/share/snmp/mibs/XIV-MIB-10.2.4.txt Module 72 XIV-MIB is in C:/usr/share/snmp/mibs/xiv-10.2.x-mib.txt Checking file: C:/usr/share/snmp/mibs/XIV-MIB-10.2.4.txt... XIV-MIB is now in C:/usr/share/snmp/mibs/XIV-MIB-10.2.4.txt
Now we are ready to walk the SNMP walk starting with the ‘xiv’ OID. The only thing you need to change in this line is the IP address of the XIV:
C:\usr\bin>snmpwalk -v 2c -c XIV -m XIV-MIB 10.10.1.10 xiv
To understand the command:
-v 2c forces SNMP version 2c
-c XIV is the community name, it is CASE sensitiVE
-m XIV-MIB forces the use of the XIV-MIB
10.10.1.10 is the XIV management module IP address
xiv is the root of the MIB
The output will go for some time, control-C when you get bored.
Here is an example of the output you will get:
C:\usr\bin>snmpwalk -v 2c -c XIV -m XIV-MIB 10.10.1.10 xiv XIV-MIB::xivMachineStatus = STRING: "Full Redundancy" XIV-MIB::xivFailedDisks = INTEGER: 0 XIV-MIB::xivUtilizationSoft = Gauge32: 69 XIV-MIB::xivUtilizationHard = Gauge32: 90 XIV-MIB::xivFreeSpaceSoft = INTEGER: 49770 XIV-MIB::xivFreeSpaceHard = INTEGER: 7954 XIV-MIB::xivIfIOPS.1004 = Gauge32: 6519 XIV-MIB::xivIfIOPS.1005 = Gauge32: 6773 XIV-MIB::xivIfIOPS.1006 = Gauge32: 6515 XIV-MIB::xivIfIOPS.1007 = Gauge32: 6557 XIV-MIB::xivIfIOPS.1008 = Gauge32: 6517 XIV-MIB::xivIfIOPS.1009 = Gauge32: 6575 XIV-MIB::xivIfStatus.1004 = STRING: "OK" XIV-MIB::xivIfStatus.1005 = STRING: "OK" XIV-MIB::xivIfStatus.1006 = STRING: "OK" XIV-MIB::xivIfStatus.1007 = STRING: "OK" XIV-MIB::xivIfStatus.1008 = STRING: "OK" XIV-MIB::xivIfStatus.1009 = STRING: "OK" XIV-MIB::xivEventCode.5 = STRING: START_WORK XIV-MIB::xivEventCode.9 = STRING: USER_SHUTDOWN
You can cut the output back by specifying just a single field (OID):
snmpwalk -v 2c -c XIV -m XIV-MIB 10.10.1.10 xivFailedDisks XIV-MIB::xivFailedDisks = INTEGER: 0
snmpwalk -v 2c -c XIV -m XIV-MIB 10.10.1.10 xivFreeSpaceSoft XIV-MIB::xivFreeSpaceSoft = INTEGER: 17111
What do all these fields mean? Let me explain:
xivMachineStatus: Shows if a disk rebuild or redistribution is occurring. xivFailedDisks: The number of failed disks in the XIV. xivUtilizationSoft: The percentage of total soft space that is allocated to pools. xivUtilizationHard: The percentage of total hard space that is allocated to pools. xivFreeSpaceSoft: The amount of soft space that is unallocated in GB. xivFreeSpaceHard: The amount of hard space that is unallocated in GB. xivIfIOPS.1004: The number of IOPS being executed by module 4 at that moment. xivIfIOPS.1005: The number of IOPS being executed by module 5 at that moment. xivIfIOPS.1006: The number of IOPS being executed by module 6 at that moment. xivIfIOPS.1007: The number of IOPS being executed by module 7 at that moment. xivIfIOPS.1008: The number of IOPS being executed by module 8 at that moment. xivIfIOPS.1009: The number of IOPS being executed by module 9 at that moment. xivIfStatus.1004: The status of module 4 at that moment. xivIfStatus.1005: The status of module 5 at that moment. xivIfStatus.1006: The status of module 6 at that moment. xivIfStatus.1007: The status of module 7 at that moment. xivIfStatus.1008: The status of module 8 at that moment. xivIfStatus.1009: The status of module 9 at that moment. xivEventCode.5: This is the first event code in the event log.
The example above stops after the first event, but in fact if you leave the walk running, it will eventually list every event in the event log. This will take a very long time. It is possible to list the events in the event log by specific event ID and it is also possible to get detailed information for each event. In the examples below, I pull out information for event ID 1000. I was not able to get this information with a single command.
C:\>snmpwalk -v 2c -c XIV -m XIV-MIB 10.10.1.10 xivEventCode.1000 XIV-MIB::xivEventCode.1000 = STRING: VOLUME_CREATE C:\>snmpwalk -v 2c -c XIV -m XIV-MIB 10.10.1.10 xivEventDescription.1000 XIV-MIB::xivEventDescription.1000 = STRING: Volume was created with name 'nestor_103' and size 17GB in Storage Pool with name 'test_pool'. C:\>snmpwalk -v 2c -c XIV -m XIV-MIB 10.10.1.10 xivEventTime.1000 XIV-MIB::xivEventTime.1000 = STRING: 2011-02-23 20:05:48
If you start the walk without a start point you get something more interesting. Its interesting because what it tells you is the Linux version being run by the XIV modules and the uptime of the module you are probing.
C:\usr\share\snmp\mibs>snmpwalk -v 2c -c XIV -m XIV-MIB 10.10.1.10 SNMPv2-SMI::mib-2.1.1.0 = STRING: "Linux nextra-WallStCO-module-4 2.6.16.46-268-xiv-220-x86_64-ixss10.2.4 #1 SMP Tue Nov 16 00:43:46 UTC 2010 x86_64" SNMPv2-SMI::mib-2.1.2.0 = OID: SNMPv2-SMI::enterprises.8072.3.2.10 SNMPv2-SMI::mib-2.1.3.0 = Timeticks: (235647520) 27 days, 6:34:35.20
So now I had two thoughts. Firstly, could I get this same information using xcli commands? To get the capacity information we can use the following command:
C:\>xcli -m 10.10.1.10 -u admin -p adminadmin system_capacity_list Soft Hard Max_Pool_Size Free Hard Free Soft Spare Modules Spare Disks Target Spare Modules Target SpareDisks 160013 79113 79113 7954 49770 1 3 1 3
In the output above we can see the Free Soft matches the xivFreeSpaceSoft output from the SNMP walk command, while the Free Hard value matches the xivFreeSpaceHard output from the SNMP walk command. But this information is only useful is you want to confirm how much space is NOT allocated to a pool. If you allocated all usable space to your pools on day one, then this command will simply confirm that there is no free space available OUTSIDE your existing pools. There may well be lots of free space WITHIN your pools. To examine free space within your pools is a different art, and I will explore that in a future post.
To see if I have any failed disks or modules, I could use the following commands. In this example there are no failed disks or modules (from the component_list command) and that the machine is fully redundant (from the state_list command). Fully redundant means every block of data has two copies (it does not mean that every component is working).
C:\>xcli -m 10.10.1.10 -u admin -p adminadmin component_list filter=notok No components match the given criteria C:\>xcli -m 10.10.1.10 -u admin -p adminadmin state_list Category Value system_state on target_state on safe_mode no shutdown_reason No Shutdownoff_type off redundancy_status Full Redundancy
So far so good. I have a way using both SNMP and xCLI, to get usage information and component availability information. I can script this to give me a way to confirm the health of my XIV.
My next challenge is to setup Nagios and IBM Director to monitor an XIV using SNMP. I am working with a client right now to achieve this. I would love to hear examples of people who are also doing this.
I am assisting one of my Delivery teams with monitoring using the WhatsUp Gold tool set. They have an XIV and the Storage Engineer uses the command line and GUI tools for looking at status and performance information regarding the XIV. These tools seem to be quite capable being specifically designed for the XIV, however I was hoping we could augment these day to day administration and fault finding capabilities by incorporating the XIV MIB into WhatsUp Gold. This would give the rest of the support team visibility on XIV status and correlation with connected systems.
Like your challenge with Nagios and IBM director, we are trying to do the same thing with WhatsUp Gold and IBM Director to monitor the XIV using SNMP, however I am wondering whether we will achieve any additional benefits over the existing XIV tools by doing this, and exactly what we should be monitoring through SNMP vs Command line\GUI. Also what sorts of SNMP traps should we want to trigger notifications on vs just informational traps.
One thing I have noticed is that IBM Support seems to get notified of things such as disk errors (ie. A potential disk failure) that we cannot see through the command line or GUI. These seems to be reported at a lower level and I am wondering whether the SNMP interface will give the team better visibility to this so our support also has early notification of such events.
The challenge with using SNMP traps is that there are 100s of different events, so the decision process on event handling can become complex.
The other issue which you have already noted, is that some events are marked as ‘internal’ meaning they do not appear in the event log even, though they are sent to IBM. This leads to some confusion when IBM contacts the client to inform them of an issue that the client did not get alerted for. Disk failures however should not be one of those.
I do not believe SNMP walks are a sufficient tool since they would not reveal failed power supplies.
This means you need to either also monitor traps or event emails, or use the component_list’ command I suggested, as that is a better tool for determining if there is a failed piece of hardware.
What kind of SNMP traps can be send by IBM System Storage DS4300?
I have IBM System Storage DS4300, I want to monitor that system using Tivoli Monitoring Agent. In order to perform monitoring, i have MIB (which is provided
with DS4000/DS5000 TotalStorage Manager) file which can be uploaded to the Tivoli Monitoring Agent, but after uploading that file, MIB is not collecting any data.
Now my question is, what kind of SNMP traps can DS4300 send?
So we need to go back to basics.
Are the traps getting out of the DS4300? Confirm with network sniffer. What I do is direct connect to the device and set my laptop as destination, then do a test Trap. If that works… then do the same with your intended trap destination. There are several freeware trap receivers you can test with.
Use Wireshark for sniffing.
Are the traps arriving at Tivoli? Confirm with network sniffer.
Is MIB compiled?
Are traps being recognized as using that MIB?
The URL doesn’t work anymore…???
The URL from IBM, I mean… goes to a Apologies page not found, page… :(
Yes it is proving a touch annoying that they clearly moved the MIB.
I changed the link to the XIV Resource Center but even there I am struggling to find the MIB!
Even google cannot find it… just finds my blog… sigh….
You can generate the MIB directly from the XIV -> xcli.py mib_get