Some scripting hints for checking your Storwize firmware version

I am in the habit of writing mini-shell scripts to paste into an SVC or Storwize terminal to create mini reports.  While the GUI does make all this quite easy, I am quite often remote from the machine, or at the end of a VPN tunnel, so using the GUI is not always convenient.    So for instance if you want to learn just the software version of your Storwize device and the firmware version of its drives,  there is no super quick way to do that from the command line.  You can use the lssystem command to get the cluster software version, but since there is no grep command on Storwize/SVC, you need to sort through all the output yourself host side or use some fancy bash tricks.   You can use the lsdrive command to get the drive firmware version but the lsdrive command does not show the drive types or firmware version in the summary version.   This is rather annoying as it means you need to run lsdrive against every drive to get that level of detail.   In a perfect world I should be able to specify which fields I want in the summary view (see an example below of the rather sparse summary view):


I wrote a small script to display the firmware version of each drive.   It looks like this:

firmwareversion=$(svcinfo lssystem -delim , | while IFS="," read -ra data
if [ "${data[0]}" == "code_level" ] 
then echo ${data[1]} 
drive=$(printf "%5s%-20s%-10s%-15s \n" "ID" " DriveType" "Capacity" "Version"
svcinfo lsdrive -nohdr -delim , | while IFS="," read -ra drives
svcinfo lsdrive -delim , ${drives[0]} | { while IFS="," read desc data
[[ $desc == "id" ]] && id=$data
[[ $desc == "product_id" ]] && product_id=$data
[[ $desc == "capacity" ]] && capacity=$data
[[ $desc == "firmware_level" ]] && firmware_level=$data
printf "%5s%-20s%-10s%-15s \n" "$id" " $product_id" "$capacity" "$firmware_level"
done);echo "";echo "Version $firmwareversion";echo "";echo "$drive"

Now for those who understand shell scripts, it uses the –delim , option to separate fields (since in general, commas are not allowed to appear in any data fields).  It then reads the output into an array with the command read -ra, telling the read command that the field delimiter is a comma with the IFS=”,” statement.

When I ran it on a Storwize V3700 running 7.3 code, it ran fine with output like this:

Version (build 97.5.1501190000)

ID  DriveType       Capacity  Version 
 0  HUS723020ALS64  1.8TB     J3K8 
 1  HUS723020ALS64  1.8TB     J3K8 
 2  HUS723020ALS64  1.8TB     J3K8 
 3  HUS723020ALS64  1.8TB     J3K8 
 4  HUS723020ALS64  1.8TB     J3K8 
 5  HUS723020ALS64  1.8TB     J3K8

  But when I ran it on a V3700 running 7.4 or 7.5 code I got this:

rbash: IFS: readonly variable
rbash: IFS: readonly variable
CMMVC5709E [0,online,,member,sas_nearline_hdd,2.7TB,0,mdisk1,5,1,8,,,inactive] is not a supported parameter.

I googled the issue and found this rather helpful forum discussion:

The solution is two-fold:

  1. Dont use IFS to define field separators.
  2. Don’t allow any names in your system to have spaces in it.  This normally doesn’t occur but it appears they may be allowing them in VDisk names.

So I re-wrote my script to not rely on IFS to separate fields and it now looks like this (and runs fine on release 7.4 and 7.5 machines):

firmwareversion=$(svcinfo lssystem | while read desc data
if [ "$desc" == "code_level" ] 
then echo $data 
drive=$(printf "%5s%-20s%-10s%-15s \n" "ID" " DriveType" "Capacity" "Version"
svcinfo lsdrive -nohdr | while read did status 
svcinfo lsdrive $did | { while read desc data  
[[ $desc == "id" ]] && id=$data 
[[ $desc == "product_id" ]] && product_id=$data 
[[ $desc == "firmware_level" ]] && firmware_level=$data
[[ $desc == "capacity" ]] && capacity=$data
printf "%5s%-20s%-10s%-15s \n" "$id" " $product_id" "$capacity" "$firmware_level"
done);echo "";echo "Version $firmwareversion";echo "";echo "$drive"

The other suggestion from the forum post is to run this whole script externally, which is a great suggestion but not as easily done as it sounds, as running an external script vs pasting in a script can cause a lot of back and forward traffic.

I wrote a BASH script that learns all the drives in one command and then gets all the detailed views in a single command as a second step.    So I pull all that I need about each drive with only two SSH commands (rather than 1 per drive).  Sorry folks, this is Unix or Mac OS only (unless you’re running some unix tools on your Windows machine).

This script presumes you have the SSH key already set up for your userid (since I don’t specify a key, but you could add it to the script).   There are a large number of blank lines simply to make each section clear.

Simply paste it into a file like this

vi   <then hit ‘i’ and paste in the data, then shift ZZ to save and exit >
chmod 755
./ -u superuser -h   < where super user is your user and is your V7000 >

The script uses optargs to get two inputs and check for them.   It has no error checking for an unreachable host.   If the user cannot login with your default SSH key it will fail.

# Script to display SVC or Storwize firmware versions

while getopts :u:h: opt
 case "$opt"
 u) username="$OPTARG";;
 h) hostname="$OPTARG";;

# We need a user name
if [ -z "$username" ] 
echo "Please use a username with -u"
echo "For instance -u superuser"

# we need a host
if [ -z "$hostname" ] 
echo "Please use a host with -h"
echo "For instance -h"

# Fetch and print the system software version
echo "Host $hostname is running Code Level: $(ssh $username@$hostname "svcinfo lssystem -delim ," | grep code_level | cut -d, -f2)"

# print the header for the drive data
printf "%5s%-15s%-20s%-15s \n" "ID" " Capacity" "DriveType" "Version"

# Fetch the drive summary view to get a list of drives
summarydrives=$(ssh $username@$hostname "svcinfo lsdrive -nohdr -delim ,")

# build the drive detailed view as one command before sending it
fetchdetailed=$(echo "$summarydrives" | while IFS="," read -ra drivedata
echo -n "svcinfo lsdrive -delim , ${drivedata[0]};"

# now grab all the detailed drive data in one command
detailedview=$(ssh $username@$hostname "$fetchdetailed")

# now chunk through the detailed view output and print in table view
echo "$detailedview" | while IFS="," read desc data
[[ $desc == "id" ]] && printf "%5s" "$data"
[[ $desc == "capacity" ]] && printf "%-15s" " $data"
[[ $desc == "product_id" ]] && printf "%-20s" "$data"
[[ $desc == "firmware_level" ]] && printf "%-15s \n" "$data"

Hopefully this is useful to someone out there.  Suggestions always welcome!


Posted in advice, IBM Storage, SAN, Storwize V3700, Storwize V7000, SVC | Tagged , , , | 6 Comments

IBM Releases several Data Integrity Alerts for Storwize products

IBM recently released three major and significant alert for Storwize products (V3500, V3700, V5000 and V7000).

I am reproducing the text from the emails I received.   I tell you this because if IBM update the Website text, my blog post may not get updated.

1691 Error on Arrays When Using Multiple FlashCopies of The Same Source

ABSTRACT: There is an issue in the RAID software that calculates parity for systems that have multiple FlashCopies of the same source. This issue will cause the parity to be calculated incorrectly and may lead to the system logging a 1691 error and may eventually lead to an undetected data loss.

Affects: Storwize devices on 7.3 and 7.4 versions
Resolution: This issue is resolved in and

Note that 7.5.0. is not the latest version – do not install that version!
At time of writing is available. If you are on 7.3 or 7.4 then stick with

Note also that the IBM link above says that the issue affects only V7000s, but this is because there are separate alerts and pages for each Storwize model.
If you are using Storwize products of any kind with FlashCopy you are affected.  If you are not using FlashCopy, read on!

Data Integrity Issue when Using Encrypted Arrays

ABSTRACT: IBM has identified an issue which can cause data to be written to the wrong location on the drive when using encrypted arrays on Storwize V7000 Gen2 systems. This will often result in systems logging 1691 and 1322 errors, and undetected data loss.
Affects: V7000s on 7.4 and 7.5 versions
Resolution: This issue is resolved by APAR HU00820 in releases and

This really does affect only V7000s a other models don’t offer this software encryption feature.   If you are not using Encryption, read on!

Data Integrity Issue when Drive Detects Unreadable Data

ABSTRACT: IBM has identified specific hard disk drive models supported by the Storwize family of products that may be exposed to possible undetected data corruption during a specific drive error recovery sequence. The corrupted data can eventually trigger the system to log a 1691 error. A firmware update that remediates against future occurrences of this issue is now available. IBM recommends that all customers with the affected drives apply these latest levels of code.

Note also that the IBM link above says that the issue affects only V7000s, but this is because there are separate alerts and pages for each Storwize model.
If you are using Storwize products of any kind with the listed Seagate disks then you are affected.

Now the website lists capacities…. but again you might be fooled.
The capacity shown here are decimal but the Storwize GUI and CLI are always adhere to binary honesty (which I like).  So don’t be fooled by the idea you are told by the GUI you have 3.6 TB drives and they are not listed in the table below…. They are 4 TB drives according to the label.

Product_id   Capacity   Minimum Firmware level containing fix 
ST300MM0006    300 GB   B56S
ST600MM0006    600 GB   B56S
ST900MM0006    900 GB   B56S
ST1200MM0007   1.2 TB   B57D
ST2000NM0023     2 TB   BC5G
ST3000NM0023     3 TB   BC5G
ST4000NM0023     4 TB   BC5G
ST6000NM0014     6 TB   BC75

Also in the GUI, I found the firmware version of my drives was not shown by default, I had to add it as per the screen capture below.   Here is a quiz question…  does the screen capture show a potentially affected machine?


If you answered YES you would be correct!

To be sure we can run the software upgrade tool, or dump the script below into a CLI window (paste the whole thing!):

svcinfo lsdrive -nohdr -delim , | while IFS="," read -ra drives; do svcinfo lsdrive -delim , ${drives[0]} | { while IFS="," read desc data ; do [[ $desc == "id" ]] && id=$data; [[ $desc == "product_id" ]] && product_id=$data; [[ $desc == "firmware_level" ]] && firmware_level=$data; done; printf "%5s%10s%10s \n" "$id " "$product_id" "$firmware_level"; }; done

The output will look like this (I showed the paste so you see what your entire PuTTY session would look like).    Again, is this an affected machine?


Yes it is affected, as BC5C is below BC5G (G being later than C in the alphabet!).

Once you know you are affected, you can follow the upgrade instructions in the IBM Alert. It is much easier to do this on 7.4 as you can upgrade your drives from the GUI instead of using the CLI.




Posted in IBM Storage, Storwize V3700, Storwize V7000, Uncategorized | 6 Comments

Actifio worked around the VMware CBT bug in 2012

There has been a lot of discussion lately about a VMware Change Block Tracking (CBT) bug that causes backup software to miss out on modified parts of VMDK files.  This results in corrupted backups.

The Register has had two articles about it:

Oct 27:  ESXi is telling fibs to backup software • The Register

Nov 3: VMware: Yep, ESXi bug plays ‘finders keepers’ with data backups • The Register

The articles point to this VMware Knowledgebase link, and mentions that there is no fix available from VMware.  Ouch!

Well of course since Actifio uses VMware Change Block Tracking (CBT) to capture images of VMs, my first thought was sh…    ahh actually this is child friendly blog…   but you get the idea.   Were petabytes of client data at risk of being bad?

Fortunately the answer is a resounding no.  Actifio does not depend on this particular API because we saw the potential for a flaw like this a long time ago.  In fact we changed the way we use the VMware APIs in 2012 to ensure this API could not affect us

Actually we were that concerned about data integrity when using external APIs that we developed a feature we call Fingerprinting to ensure the integrity of our images. With every image that Actifio creates, Actifio uses a sampling technique to confirm that the image created in Actifio’s storage pools is the same as the source we were fetching data from.  This applies to both VMs and to images created by our Connector software.

So with this Actifio customers can be assured that all available virtual images are free of any corrupt data due to CBT, backup calls, or any other capture procedures.

Posted in Actifio | Tagged , , | 2 Comments

Monitoring IBM Storwize and IBM SVC products with Splunk

I have been playing around with Splunk recently, so I can understand what it is and why my customers may choose to it.   For those that don’t know, Splunk (the product) captures, indexes and correlates real-time data in a searchable repository from which it can generate graphs, reports, alerts, dashboards and visualizations.  In essence Splunk is a really cool and smart way to look at and analyse your data.

Because Splunk is able to ingest data from almost any source we can quite easily start pulling data out of an IBM Storwize or SVC product and then investigate with Splunk.  I couldn’t find anything in Google on this subject, so here is a post that will help you along.

A common way to get data into Splunk is to use syslog.   Since Storwize can send events to syslog, all we need to do on the Storwize side is configure where the Splunk server is.

In this example I have chosen syslog level 7 (which is detailed output) and to send all events.


Then on Splunk side, ensure Splunk is listening for syslog events.   Storwize always uses UDP port 514:


However this really only captures events.   There are lots of other pieces of information we may want to pull out of our Storwize products and graph in Splunk.   So lets teach Splunk how to get them using CLI over SSH.

Firstly we need to supply Splunk a user ID so it can login to our Storwize and grab data.   I created a new user on my Storwize V3700 called Splunk, placed it in the Monitor group (so anyone with the Splunk userid and password can look but not touch) and then supplied a public SSH key since I don’t want to store a password in any text file and using SSH keys makes things nice and easy.  In this case I am using the file for the root user of my Splunk server, since in my case Splunk is running all scripts as root.


Now from my root command prompt on the Splunk server  (called av-linux) I test that access works to my V3700 (on IP address using the lsmdiskgrp command.   It’s all looking good.

[root@av-linux ~]# ssh splunk@ "lsmdiskgrp -delim ,"

So I am now set up to write scripts that Splunk can fire on a regular basis to pull data from my Storwize device using SSH CLI commands.

Now here are two important things to realize about using SSH commands to pull data from Storwize and ingest them into Splunk:

  1. For historical data like logs, it is very easy to pull the same data twice.  For instance if I grab the contents of the lseventlog command using an SSH script then I will get every event in the log, which is fine.   But if I grab it again the next day, most of the same events will be ingested.   If I am looking to validate how often a particular event occurs I will count the same event many times as I ingested it many times.   Ideally the Storwize CLI commands would let me filter on dates, but that functionality is not available
  2. Real time display commands don’t insert a date into the output, but Splunk will log the date and time that each piece of data was collected on.

Lets take the output of lsmdiskgrp as shown above.   If we run this once per day we could track the space consumption of each pool over time.   Sounds good right?   So on my Splunk server I create a script like this.  Notice I get the output in bytes, this is important as the default output could be in MB or GB or TB.

ssh splunk@ “lsmdiskgrp -delim , -bytes”

I put the script into the /opt/splunk/bin/scripts folder and call it v37001pools.

I make it executable and give it a test run:

[root@av-linux scripts]# pwd
[root@av-linux scripts]# chmod 755 v37001pools
[root@av-linux scripts]# ./v37001pools

So now I tell Splunk I have a new input using a script:


Input the location of the script, the interval and the fact that this is CSV (because we are using -delim with a comma.  Note my interval is crazy:   every 60 seconds is way too often, even every 3600 seconds is probably too often.  I used it to get lots of samples quickly.


I now confirm I have new data I can search:


And the data itself is time stamped with all fields identified and has all the data like pool names.

Now I can start graphing this data.   With Splunk what I find is that if someone publishes the XML this makes life way easier.    So I created an empty Dashboard called Storwize Pools and then immediately select Edit Source


Now replace the default source (delete any text already in the source) with this where you change the heading and script name with your own (in red) and the pool name of one of your pools (in blue).  If you have more than one pool, add an additional chart for every pool (copy all the chart section and just make a new chart).

In the attached word document you will find the required XML.   For some reason WordPress kept fighting me and changing my quotes so I have attached the XML as a doc.


And we get a lovely Dashboard that looks like this.  Because the script runs every 60 seconds, I am getting 60 second stats.


We could run it every day or use a cron job to run it at the same time of every day (which makes more sense).   Maybe once per day at 1am by setting the interval to a cron value like this:   0 01 * * *


So hopefully that will help you get started with monitoring your SVC or Storwize product with Splunk.

If you would like some more examples, just leave a comment!

Posted in Uncategorized | 2 Comments

The quest for perfect knowledge doesn’t start with a screen capture

One of the fun parts of my job is problem solving….  I wont lie… I love it.

Step one in problem solving is always the same:  define the problem.
Step two:  get the data needed to solve the problem.
Step three:  solve it!

Simple, right?


One of the reasons IT gets it wrong again and again is simple:  the assumption of perfect knowledge.   We assume that with one sentence or even worse, one screen capture, we have described the problem with enough depth that it can now be solved.   That the team now perfectly understand the problem and that the solution they supply will be…. wait for it….  you guessed it….  perfect!


Don’t get me wrong, I love screen captures (using my favourite tool, Snagit).   In fact screen captures are one of my number one tools for writing documentation.  When I worked on IBM Redbooks (one of IBMs greatest free gifts to the IT community) I often found some chapters were more picture than text… and that was ok.   People need to see what it is you are talking about.

But when it comes to describing a problem, in the vein of a picture is worth a thousand words, a screen capture can be the devil itself.   The issue with screen captures is simple:   they contain information that cannot be easily searched or indexed (apart from with your eyeball).   They may show the problem or just barely validate that the problem exists, but they rarely help in SOLVING the problem.

Last week I got my favourite kind of screen capture, the one taken of a screen with a phone (with the reflection of the photographer clearly visible in the shot).   Apart from giving me the ability to rate that person’s fashion sense, these kinds of shots are among the worst.   Amusingly when I asked why I didn’t also get logs, I was told the customers security standards would not allows logs to be sent.   Yeah right… this is the same customer who doesn’t mind you standing in the middle of their computer room taking photos of their displays with your phone?


So the next time you plan on sending a screen capture, stop for a minute and consider…  is this enough for a perfect solution?   Are there no logs I can send along with this picture?  Has the vendor supplied a tool I can use to offload data?   Or even better automatically send it?    Am I doing anything more than just describing the problem itself?

Posted in advice | 6 Comments

Shellshock and IBM SVC and Storwize products

While blogging last week about how various vendors have responded to the Shellshock exploit, I noted that several vendors, notably Oracle and Cisco were open about products that they did not yet have a fix for.     IBM meanwhile appears to be only announcing vulnerability after they have the fix.   In other words, vulnerable customers are left without formal notification that they are exposed, or made aware of any workarounds, until a fix is actually available.   I am left slightly annoyed by this policy.

MrPotatoHead_11The formal notification for the Storwize family and IBM SVC family came out here on October 11, 2014.  At time of writing these are the fix levels:

IBM recommends that you fix this vulnerability by upgrading affected versions of IBM SAN Volume Controller, IBM Storwize V7000, V5000, V3700 and V3500 to the following code levels or higher:

More importantly it contains this critical piece of information:

Vulnerability Details

The following vulnerabilities are only exploitable by users who already have authenticated access to the system.

In other words, the best way to manage exposure is to limit the number of users who have CLI access and to use network restrictions (such as ACLs and Firewalls) to restrict network access to your devices.

So kudos to IBM for creating fixed versions, I just wish that acknowledgement and remediation advice could have been published earlier.


Posted in Uncategorized | Tagged , , , , | 2 Comments

Vale Randall Davis

I received some very sad news last week that Randall Davis has passed away.

Randall was a very experienced and capable IT professional based in Melbourne Australia. He worked for IBM for many years;  co-authored several IBM Redbooks and fathered two wonderful boys with his wife Fiona.

Randalls funeral will be held in the Federation Chapel, Lilydale Memorial Park, 126-128 Victoria Rd, Lilydale on Wednesday Oct. 8, 2014, commencing at 11.15 am.

If you knew Randall and wish to pay your respects, then please attend.

Posted in Uncategorized | 2 Comments