A new XIV MIB has been posted on the MIB download site for XIV, found here:
This reminded me of my plan to blog on how to use SNMP to probe the XIV using an SNMP walk. In this post I will describe how to do this using a Windows workstation. The whole point of this exercise is to explore how to use SNMP to confirm the status of the XIV.
Firstly we need to confirm the SNMP community name set on the XIV has not been changed. The following is an xcli command that will pull the community name from the XIV. To run this command you would need to change the IP address and password to match your own machines:
xcli -m 10.10.1.10 -u admin -p adminadmin config_get name=snmp_community
The expected response is:
Name Value snmp_community XIV
My first surprise was that the SNMP community name is XIV by default, not public. But hey… this is XIV and we love the number 14 #;-)
We now need to install net-snmp onto our Windows workstation. This gives us a tool to compile the MIB and actually issue SNMP commands. I downloaded the latest version from here:
Having downloaded and installed net-snmp I downloaded the XIV MIB to the following folder:
You get the MIB from here.
Now we force a re-compile of all MIBS to compile the XIV MIB:
Look for messages like these. The module number may be different, depending on how many MIBs already exist in that folder:
XIV-MIB is in C:/usr/share/snmp/mibs/XIV-MIB-10.2.4.txt Module 72 XIV-MIB is in C:/usr/share/snmp/mibs/xiv-10.2.x-mib.txt Checking file: C:/usr/share/snmp/mibs/XIV-MIB-10.2.4.txt... XIV-MIB is now in C:/usr/share/snmp/mibs/XIV-MIB-10.2.4.txt
Now we are ready to walk the SNMP walk starting with the ‘xiv’ OID. The only thing you need to change in this line is the IP address of the XIV:
C:\usr\bin>snmpwalk -v 2c -c XIV -m XIV-MIB 10.10.1.10 xiv
To understand the command:
-v 2c forces SNMP version 2c
-c XIV is the community name, it is CASE sensitiVE
-m XIV-MIB forces the use of the XIV-MIB
10.10.1.10 is the XIV management module IP address
xiv is the root of the MIB
The output will go for some time, control-C when you get bored.
Here is an example of the output you will get:
C:\usr\bin>snmpwalk -v 2c -c XIV -m XIV-MIB 10.10.1.10 xiv XIV-MIB::xivMachineStatus = STRING: "Full Redundancy" XIV-MIB::xivFailedDisks = INTEGER: 0 XIV-MIB::xivUtilizationSoft = Gauge32: 69 XIV-MIB::xivUtilizationHard = Gauge32: 90 XIV-MIB::xivFreeSpaceSoft = INTEGER: 49770 XIV-MIB::xivFreeSpaceHard = INTEGER: 7954 XIV-MIB::xivIfIOPS.1004 = Gauge32: 6519 XIV-MIB::xivIfIOPS.1005 = Gauge32: 6773 XIV-MIB::xivIfIOPS.1006 = Gauge32: 6515 XIV-MIB::xivIfIOPS.1007 = Gauge32: 6557 XIV-MIB::xivIfIOPS.1008 = Gauge32: 6517 XIV-MIB::xivIfIOPS.1009 = Gauge32: 6575 XIV-MIB::xivIfStatus.1004 = STRING: "OK" XIV-MIB::xivIfStatus.1005 = STRING: "OK" XIV-MIB::xivIfStatus.1006 = STRING: "OK" XIV-MIB::xivIfStatus.1007 = STRING: "OK" XIV-MIB::xivIfStatus.1008 = STRING: "OK" XIV-MIB::xivIfStatus.1009 = STRING: "OK" XIV-MIB::xivEventCode.5 = STRING: START_WORK XIV-MIB::xivEventCode.9 = STRING: USER_SHUTDOWN
You can cut the output back by specifying just a single field (OID):
snmpwalk -v 2c -c XIV -m XIV-MIB 10.10.1.10 xivFailedDisks XIV-MIB::xivFailedDisks = INTEGER: 0
snmpwalk -v 2c -c XIV -m XIV-MIB 10.10.1.10 xivFreeSpaceSoft XIV-MIB::xivFreeSpaceSoft = INTEGER: 17111
What do all these fields mean? Let me explain:
xivMachineStatus: Shows if a disk rebuild or redistribution is occurring. xivFailedDisks: The number of failed disks in the XIV. xivUtilizationSoft: The percentage of total soft space that is allocated to pools. xivUtilizationHard: The percentage of total hard space that is allocated to pools. xivFreeSpaceSoft: The amount of soft space that is unallocated in GB. xivFreeSpaceHard: The amount of hard space that is unallocated in GB. xivIfIOPS.1004: The number of IOPS being executed by module 4 at that moment. xivIfIOPS.1005: The number of IOPS being executed by module 5 at that moment. xivIfIOPS.1006: The number of IOPS being executed by module 6 at that moment. xivIfIOPS.1007: The number of IOPS being executed by module 7 at that moment. xivIfIOPS.1008: The number of IOPS being executed by module 8 at that moment. xivIfIOPS.1009: The number of IOPS being executed by module 9 at that moment. xivIfStatus.1004: The status of module 4 at that moment. xivIfStatus.1005: The status of module 5 at that moment. xivIfStatus.1006: The status of module 6 at that moment. xivIfStatus.1007: The status of module 7 at that moment. xivIfStatus.1008: The status of module 8 at that moment. xivIfStatus.1009: The status of module 9 at that moment. xivEventCode.5: This is the first event code in the event log.
The example above stops after the first event, but in fact if you leave the walk running, it will eventually list every event in the event log. This will take a very long time. It is possible to list the events in the event log by specific event ID and it is also possible to get detailed information for each event. In the examples below, I pull out information for event ID 1000. I was not able to get this information with a single command.
C:\>snmpwalk -v 2c -c XIV -m XIV-MIB 10.10.1.10 xivEventCode.1000 XIV-MIB::xivEventCode.1000 = STRING: VOLUME_CREATE C:\>snmpwalk -v 2c -c XIV -m XIV-MIB 10.10.1.10 xivEventDescription.1000 XIV-MIB::xivEventDescription.1000 = STRING: Volume was created with name 'nestor_103' and size 17GB in Storage Pool with name 'test_pool'. C:\>snmpwalk -v 2c -c XIV -m XIV-MIB 10.10.1.10 xivEventTime.1000 XIV-MIB::xivEventTime.1000 = STRING: 2011-02-23 20:05:48
If you start the walk without a start point you get something more interesting. Its interesting because what it tells you is the Linux version being run by the XIV modules and the uptime of the module you are probing.
C:\usr\share\snmp\mibs>snmpwalk -v 2c -c XIV -m XIV-MIB 10.10.1.10 SNMPv2-SMI::mib-18.104.22.168 = STRING: "Linux nextra-WallStCO-module-4 22.214.171.124-268-xiv-220-x86_64-ixss10.2.4 #1 SMP Tue Nov 16 00:43:46 UTC 2010 x86_64" SNMPv2-SMI::mib-126.96.36.199 = OID: SNMPv2-SMI::enterprises.8072.3.2.10 SNMPv2-SMI::mib-188.8.131.52 = Timeticks: (235647520) 27 days, 6:34:35.20
So now I had two thoughts. Firstly, could I get this same information using xcli commands? To get the capacity information we can use the following command:
C:\>xcli -m 10.10.1.10 -u admin -p adminadmin system_capacity_list Soft Hard Max_Pool_Size Free Hard Free Soft Spare Modules Spare Disks Target Spare Modules Target SpareDisks 160013 79113 79113 7954 49770 1 3 1 3
In the output above we can see the Free Soft matches the xivFreeSpaceSoft output from the SNMP walk command, while the Free Hard value matches the xivFreeSpaceHard output from the SNMP walk command. But this information is only useful is you want to confirm how much space is NOT allocated to a pool. If you allocated all usable space to your pools on day one, then this command will simply confirm that there is no free space available OUTSIDE your existing pools. There may well be lots of free space WITHIN your pools. To examine free space within your pools is a different art, and I will explore that in a future post.
To see if I have any failed disks or modules, I could use the following commands. In this example there are no failed disks or modules (from the component_list command) and that the machine is fully redundant (from the state_list command). Fully redundant means every block of data has two copies (it does not mean that every component is working).
C:\>xcli -m 10.10.1.10 -u admin -p adminadmin component_list filter=notok No components match the given criteria C:\>xcli -m 10.10.1.10 -u admin -p adminadmin state_list Category Value system_state on target_state on safe_mode no shutdown_reason No Shutdownoff_type off redundancy_status Full Redundancy
So far so good. I have a way using both SNMP and xCLI, to get usage information and component availability information. I can script this to give me a way to confirm the health of my XIV.
My next challenge is to setup Nagios and IBM Director to monitor an XIV using SNMP. I am working with a client right now to achieve this. I would love to hear examples of people who are also doing this.