Why the IBM SSIC is the best place to get your supported hardware information

Its been a while since I last blogged, but here is something that might interest you.  To spice things up I even recorded a quick video blog of the same information you will find written here.  So skip the video and keep reading, or watch the video and skip the reading, the choice is yours!

For many years I have used the ‘Supported Hardware List’ websites from IBM to qualify SVC support.    If you want to know if an Infinidat is supported behind IBM SVC, which version and what code level,  it’s all there.

So traditionally I would go to here.  From there you get a great list of code levels, choose your code level and then look up your product:

However I have always had some tiny misgivings about these sites.  After all, they obey no law of sorting I have ever seen.   Alphabetical order any one?   It’s like the Web Admin is a worshiper of Cthulhu and has managed to translate non-euclidian geometry into a list of vendor names.   Or maybe the site is just TOO HARD TO MAINTAIN.

Take a look and suggest a logic to this list of vendors (please I beg you):

But there is a bigger problem:  Theses sites are just slightly out of date.

Lets use that example I first raised,  if look here I find Infinidat version 2.0 is supported with SVC 7.8:

But if I then go to the SSIC here.   I get told version 3.0.x is also supported:

This was not a one-off, I found multiple products where the SSIC seemed to reflect newer information than the Supported  Hardware websites.

Moral of the story?   Always use the SSIC to confirm support, not the Supported Hardware pages.

Posted in IBM, IBM Storage, IBM XIV, SAN, Storwize V3700, Storwize V7000, SVC | Tagged | Leave a comment

Actifio at Tech Field Day 11

In this blog post I want to inform you about Actifio’s presentation to Tech Field Day 11.

Tech Field Days are one of my favourite technical information sources.    They involve a group of prominent bloggers and industry personalities being given a briefing and demonstration (normally somewhere between 2-4 hours) by an IT company about their products and viewpoint.   It is a chance for both IT entrants and established IT Companies to tell their story, explain the why, have their ideas challenged by some smart people and get some relatively free publicity, while the bloggers get the chance to gather material to write blogs, learn about our rapidly challenging world and sometimes show how clever and insightful they are at the same time.   It is a genuine win-win for everybody.

Actifio were last at Tech Field Day 4 in 2010, which explains why competitive information about Actifio is often so laughably wrong.  I think other Vendors watch these (in IT terms), ancient videos and presume nothing has changed since!    The good news for Actifios competitors and prospective and existing customers is they can now update their knowledge of Actifio by watching Actifio present at Tech Field 11 in 2016

The really nice thing is that the presentations have been split into five easily consumed videos, each about 20 minutes long. So please drop by the Tech Field Day page, take a look at the presented subjects and learn about Copy Data Management and how Actifio’s products bring a new and unique way for our customers to move to the hybrid cloud, dramatically improve their agility and modernise their business resiliency.

To make it easy, I have reposted all the Actifio video links below, but you can also get to them from here where you can also check out the other vendors who presented.

Actifio Welcome with Ash Ashutosh

Watch on Vimeo

Actifio CEO and Founder, Ash Ashutosh, introduces the company and its technology to the Field Day delegates.

Personnel: Ash Ashutosh

Actifio Architecture Overview

Watch on Vimeo

Brylan Achilles and Chandra Reddy of Actifio, introduces the company’s product architecture.

Personnel: Brylan AchillesChandra Reddy

Actifio Resiliency Director Overview and Demo

Watch on Vimeo

Brylan Achilles and Chandra Reddy of Actifio, introduces Actifio Resiliency Director.

Personnel: Brylan AchillesChandra Reddy

Actifio Global Manager Overview and SQL Server Demo

Watch on Vimeo

Brylan Achilles and Chandra Reddy of Actifio, introduce Actifio Global Manager and demonstrate its use with Microsoft SQL Server.

Personnel: Brylan AchillesChandra Reddy

Actifio Global Manager Oracle and Ansible Demo

Watch on Vimeo

Brylan Achilles and Chandra Reddy of Actifio, demonstrate Actifio Global Manager with Oracle and orchestration with Ansible.

Personnel: Brylan AchillesChandra Reddy

Actifio ReadyVault and Object Storage

Watch on Vimeo

Brylan Achilles and Chandra Reddy of Actifio, introduce Actifio ReadyVault and show how the product can work with object storage. Ash Ashutosh, CEO and Founder, then returns to answer questions.


Posted in Actifio, Uncategorized | Leave a comment

Bits are cheap! Don’t sell yourself short on key length

Using SSH keys to perform password-free login is quite common in Unix hosts and in  Appliances that have embedded Unix (like Storwize products).

You effectively have a public key which is shared  and a private key (usually with a PPK extension) that is not shared.  Think of the public key like the lock in your front door, that  everyone can see.   Think of the private key like the door key in your pocket or hand-bag.   If you keep your private key secure, your door is relatively secure.  If you lose your keys, your door is most likely no longer secure (unless they are down the back of the couch).

Sticking with the door analogy, the risk with a door lock is that someone could still just try to kick your door in (brute force attack) or pick your lock.   The bit length of the key can make this harder to achieve: the longer the bit length the harder it is to crack.

It is not unusual to see instructions that suggest you use a command like this to generate keys, where a bit length of 1024 is specified with an RSA key:

ssh-keygen –b 1024 -t rsa -f ~/.ssh/id_rsa

Of if using PuTTYgen to create the keys, to see instructions like this:

  1. Start PuTTYgen by clicking Start > Programs > PuTTY > PuTTYgen. The PuTTY Key Generator panel is displayed.
  2. Click SSH-2 RSA as the type of key to generate.
    Note: Leave the number of bits in a generated key value at 1024.

The problem is that these instructions are all old.  In fact using the ssk-keygen command syntax example shown above would represent a down-grade in what is now the default setting.   The wiki and man pages for ssh-keygen both confirm that for RSA, the default length is now 2048 bits (not 1024 bits).

To confirm what key length you get by default, simply make a test key and then read it back.  In this example I create a new public/private key pair called testkey  without specifying a bit length (there is no -b 1024):

anthony$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/Users/anthony/.ssh/id_rsa): testkey
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in testkey.
Your public key has been saved in testkey.pub.
The key fingerprint is:
SHA256:or3Yhykd0W569QcHtGk4ZMSdQDYlaM9ko+TiWQZ7pp4 anthony@Anthonys-Actifio-MacBook-Pro.local
The key's randomart image is:
+---[RSA 2048]----+
| =B+.. |
| . +.Bo+ |
| B O + o |
| + O = = |
| ..@S o . |
| o=.o . . . |
| .o.B . . o |
| .oE.o . . |
| ..oo . |

I then read the file back using the -l and -f params (specifying the name of the file) and confirm the bit length, which in this case is 2048 bits as highlighted by the red text:

anthony$ ssh-keygen -l -f testkey
2048 SHA256:or3Yhykd0W569QcHtGk4ZMSdQDYlaM9ko+TiWQZ7pp4 anthony@Anthonys-Actifio-MacBook-Pro.local (RSA)

When using PuTTYGen, if you use a recent version you will note that the default bit length is now 2048 (as indicated by the red circle).   If you load a key you should see the bit length of the loaded key as indicated by the orange circle.


So if you see instructions specifying the creation of a 1024 bit key, I suggest you ignore them and use 2048 bits or at the least question this with your vendor.   Equally if you are using older keys, it is well worth checking their bit length and generating new keys, since this will give you the now default bit length of 2048, but also renew them, reducing the risk of someone using an older (and potentially leaked) key inappropriately.




Posted in Uncategorized | Leave a comment

Triple 000 app – a recommended app for all your smart devices

I want to draw my Australian friends to an app called “Emergency +”, available for your smart device (Apple, Android and Windows).

The scenario is simple.  You see something terrible:  a fire; a car crash; a natural disaster.  The standard response is simple:   You should dial 000 (the 911 equivalent in Australia).

One of the first things you are asked is usually:

“What is your location?  What are you?”

Now that’s easy if you are at home….  but what if you are on the road, or at a store,  or walking the dog?

The idea is to eliminate confusion over your location.

First you open the App and see this:


Select the map and determine your location:

File_000 (1)

Then dial 000 using the App (you will get a pop-up like this):



It will start a phone call, at which point you should switch to  speaker mode (hands free) and jump back to the app.  You now have your address and your exact location (to a number of meters) for you to share with the responder on the phone.

Details of the app are here:

Look in your smart device app store for an app with this icon:


I urge you to install this app and also encourage your friends to do so too.
Sit down with your family and install it on everyone’s phone.  Do it tonight.

It might save someones life.

Posted in Uncategorized | Leave a comment

Exact MSP Space Accounting on a Storwize Pool

I have blogged in the past about the classic IT Story, The Cuckoo’s Egg by Clifford Stoll.   A true story that details how Clifford discovered a hacker while trying to account for 9 seconds of mainframe processing time.

I was reminded of this recently while doing an MSP Space Accounting project.  MSPs (Managed Service Providers) are understandably cost focused as they try to compete with low-cost IAAS (Infrastructure As A Service) providers like Amazon.   To control costs, shared resources are normally employed as well as thin-provisioning and its cousin over-provisioning (don’t confuse them,  thin-provisioning just means using only the exact resources needed for an objective, where over-provisioning means promising or committing to more resources than you actually have, in the hope that no one calls your bluff.   You can always use thin-provisioning without using over-provisioning).

A Storwize pool can use both thin and over-provisioning.   As an MSP if you are looking at pool usage you may want to be clear exactly how much space each client in the shared pool is using.   Now I don’t want to burn time explaining the exact workings of thin provisioning (something that Andrew Martin explains very well here), but I wanted to point out a quirk that may confuse you while trying to do space accounting.

In this example I have a Storwize pool that is 32.55 TiB in size and is showing 22.93 TiB Used.  You can clearly see we have over-allocated the 32.55 TiB of disk space by having created 75.50 TiB of virtual volumes!


Now this is significant because if I wanted to do space accounting I would expect the Used capacity of all volumes in the pool to sum up  22.93 TiB of space.  In other words if five end clients are sharing this space and I know which volumes relate to which client, I would expect the sum total of all volumes used by all clients to equal 22.93 TiB.

If I bring up the properties panel for the pool I can clearly see metrics for the pool including the extent size (in this example 2.00 GiB, remember that, it is significant later).


Now for each thin provisioned volume I get three size properties:

Used: 768.00 KiB   
Real: 1.02 GiB   
Total: 100.00 GiB  

To explain what these are:

  • Used capacity is effectively how much data has been written to the volume (which includes the B-Tree to track thin space allocation).
  • Real capacity is how much space in grains has been pre-allocated to the volume from extents allocated from the pool.
  • Total capacity is the size advertised to the hosts that can access this volume.

This means I could sum either Used capacity or Real capacity.   Since Real capacity is always larger than Used capacity, it makes more sense to sum this.  Especially if this is the number I am using to determine usage by clients inside a shared pool.

To get the used space size of all volumes we need to differentiate between fully provisioned (Generic) volumes and Thin-Provisioned volumes.

This command will grab all the Generic volumes in a specific pool (in this example called InternalPool1):

lsvdisk -bytes -delim ,  -filtervalue se_copy_count=0:mdisk_grp_name=InternalPool1

This command will grab all the thin volumes in a specific pool (in this example called InternalPool1):

lssevdiskcopy -bytes -delim , -nohdr -filtervalue mdisk_grp_name=InternalPool1

Add the -nohdr option if you wish to use these in a script.

So for the generic volumes we can sum the capacity field.   In this example pool, I used a spreadsheet and found it sums to 19,404,662,243,328 byes

So for the thin volumes we can sum the real capacity field.   In this example pool,  I used a spreadsheet and found it sums to 5,260,831,053,824 bytes.

This brings us to a combined total of 24,665,493,297,152 bytes which is 22.43 TiB.

The problem here is obvious.   I expected to account for 22.93 TiB of space, but summing the combined total of actual capacity for full-fat volumes and real-capacity for thin volumes doesn’t add up to what I expect.  In fact in this example I am short by around 0.5 TiB of used capacity.  How do I allocate this space to a specific client if no volume owns up to using it?

I can actually spot this in the CLI as well using just the lsmdiskgrp command.  If I subtract real capacity 24,665,493,297,152 from total capacity 35,787,814,993,920 I get 11,122,321,696,768 bytes, which is nowhere near reported free capacity of  10,578,504,450,048 bytes.  This again reveals 543,817,246,720 bytes (0.494 TiB) of allocated space that is not showing against volumes.

IBM_Storwize:Actifio1:anthonyv>lsmdiskgrp -bytes 0
 id 0
 name InternalPool1
 status online
 mdisk_count 1
 vdisk_count 525
 capacity 35787814993920
 extent_size 2048
 free_capacity 10578504450048
 virtual_capacity 83010980413440
 used_capacity 23916077907968
 real_capacity 24665493297152

The answer is that the space is actually allocated to volumes, but is not being accounted for at a volume level.   If you scroll up to the second screen shot showing the Pool overview you can see the Extent Size is 2 GiB.   That means the minimum amount of space that gets  allocated to a volume is actually 2 GiB.  But if we look at the volume properties of a single volume, there is no indication that this volume is actually holding down 2 GiB of pool space.     In this example I can see only 1.02 GiB of space being claimed.  So for this example volume there is actually 0.98 GiB of space allocated to the volume which is not actually being acknowledged as being dedicated to that volume.


So how do I cleanly allocate this 0.5 TiB?

I see two choices.   The first is to simply determine the shortfall, divide it by the number of thin allocated volumes and then add that usage to each thin volume.     In this example I have 519 thin volumes, so if I divide  543,817,246,720 by 519 thats pretty well 1 GiB per volume I could simply add to that volume’s space allocation.

The second is to accept it as a space tax and simply plan for it.   The issue is far less pronounced if the volume quantity is small and the volume size is large.  The issue is also far less pronounced with smaller extent sizes.   At very small extent sizes it in fact will most likely not occur at all or be truly trivial in size (like Clifford’s 9 seconds). In this example simply using 1 GiB extents would have pretty well masked the issue.    But remember that the smaller your extent size, the smaller your maximum cluster size can be.  A 2 GiB extent size means the maximum cluster size is 8 PiB.



Posted in Uncategorized | Leave a comment

Mapping Linux RDMs to Storwize Volumes

As a follow-up to my previous post about MPIO software and RDMs, I suggested SDDDSM could help you map Windows volumes to Storwize volumes.    This led to the obvious question:   What about Linux VMs?

In a distant time there was a version of IBM SDD for Linux (in fact you can still download it).  But because it was closed source and used compiled binaries, it meant that users could only use specific Linux distributions/Kernel versions.    This was rather painful (especially if you upgraded your Linux version due to some other bug and then found SDD no longer worked).    Fortunately native Multipathing for Linux rapidly matured and offered a simple and native option that is definitely the way to go (and please don’t listen to the vendors pushing proprietary MPIO software, integration native to the Operating System using vendor plug-ins is in my opinion  the only acceptable MPIO solution).

Either way, it turns out you don’t even need multi path software to map a Storwize Volume to an Operating System device.

In this example I have created a volume on a Storwize V3700 with a UID then ends in 0043.


It is mapped as a pRDM to a VM, I can see the same UID under the Manage Paths window.  You can see the same UID at the top of the window (ending with 0043).


On the Linux VM that is using this VM, I want to confirm if the device /dev/sdb matches the pRDM.   In this example we use the smartctl command.   We can clearly see the matching Logical Unit ID  (ending in 0043), so we know that /dev/sdb is indeed our pRDM.

[root@demo-oracle-4 ~]# smartctl -a /dev/sdb
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-573.3.1.el6.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

Vendor: IBM 
Product: 2145 
Revision: 0000
User Capacity: 5,368,709,120 bytes [5.36 GB]
Logical block size: 512 bytes
Logical Unit id: 0x60050763008083020000000000000043
Serial number: 00c02020c080XX00
Device type: disk
Transport protocol: Fibre channel (FCP-2)
Local Time is: Sat Apr 16 23:16:09 2016 EDT
Device does not support SMART

Error Counter logging not supported
Device does not support Self Test logging
[root@demo-oracle-4 ~]#

If you find smartctl is not installed, then install the smartmontools package:

yum install smartmontools

If we have Linux multipath configured, we can also use the multi path -l (or -ll) command to find the UID and determine which Storwize Volume is which Linux device.  Again I can easily spot that mpathb (sdb) is my Storwize volume with the UID ending in 0043.

[root@centos65 ~]# multipath -ll
mpathb(360050763008083020000000000000043) dm-6 IBM,2145
size=5G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=50 status=active
 `- 5:0:1:0 sdb 8:96 active ready running

So Linux users will actually find it quite easy to map OS disks back to the Storwize volume.


Posted in IBM, IBM Storage, Storwize V3700, Storwize V7000 | Tagged , , | Leave a comment

Do RDMs need MPIO?

I got a great question the other day regarding VMware Raw Device Mappings:

If an RDM is a direct pass though of a volume from Storage Device to VM, does the VM need MPIO software like a physical machine does?

The short answer is NO,  it doesn’t.  But I thought I would show why this is so, and in fact why adding MPIO software may help.

First up, to test this, I created two volumes on my Storwize V3700.


I mapped them to an ESXi server as LUN ID 2 and LUN ID 3.  Note the serials of the volumes end in 0040 and 0041:


On ESX I did a Rescan All and discovered two new volumes, which we know match the two I just made on my V3700, as the serial numbers end in 40 and 41 and the LUN IDs are 2 and 3:


I confirmed that the new devices had multiple paths, in this example only two (one to each Node Cannister in the Storwize V3700):


I then mapped them to a VM as RDMs, the first one as a Virtual RDM (vRDM), the second as a Physical (pRDM):


Finally on the Windows VM I Scanned for New Devices and brought up  the properties of the two new disks.   Firstly you note that the first disk (Disk 1) is a VMware Virtual disk while the second disk (Disk 2) is an IBM 2145 Multi-Path disk.   This is because the first one was mapped as a vRDM, while the second was mapped as a pRDM.


So here is the question, if the Physical RDM is a multi-path device, does it have one path or many?      The first hint is that we only got one disk for each RDM.  But what do I see if I actually install MPIO software?    So I installed SDDDSM and displayed path status using the datapath query device command

C:\Program Files\IBM\SDDDSM>datapath query device

Total Devices : 1

SERIAL: 60050763008083020000000000000040
Path#    Adapter/Hard Disk          State  Mode    Select Errors
    0  Scsi Port2 Bus0/Disk2 Part0  OPEN   NORMAL      86      0

C:\Program Files\IBM\SDDDSM>

What the output above shows is that there is only one path being presented to the VM, even though we know the ESXi HyperVisor can see two paths.

So this proves we didn’t actually need to install SDDDSM to manage pathing, as there is only one path being presented to the disk (the HyperVisor is handling the multiple paths using its own MPIO capability VMW-SATP-ALUA, which we can see in the ESXi pathing screen capture further up above.

Having said all that, there is one advantage from the Windows VM perspective to have SDDDSM installed, which is that I can see that Disk2 maps to the V3700 volume with a serial that ends in 40 (rather than 41).   So If I wanted to remove the vRDM volume (Disk 1) I know with safety that the volume ending in ’41’ is the correct one to target.


Posted in IBM Storage, Storwize V3700, Storwize V7000, Uncategorized, vmware | Tagged , | 7 Comments

Evergreen Storage? Can it actually work?

Pure Storage is one of several hot flash vendors in the market right now.   Despite some negativity about their recent IPO, it actually shows that the market thinks they have got their product and execution right.

One challenge for every Flash vendor out there (and there are quite a few) is to be able to explain the why.   Why my product and not another vendors?

One thing Pure Storage promote as a strong ‘why us‘  is their concept of Evergreen Storage, described here:


Fundamentally they are saying that as technology evolves, their modular physical design and stateless software design will allow you to upgrade components without having to move data or do any of these forklift upgrades.  Here is an image from their brochure:


Even with Storage vMotion, the need to move data between storage arrays remains a major additional cost of replacing or upgrading storage hardware, and the ability to minimise or eliminate this work is definitely a huge plus.

But can they actually do it?  Do we have working examples of other vendors achieving this?

There is actually a good working model of a product that has done exactly this since 2003: The IBM SAN Volume Controller.     When IBM released the SVC in 2003, the first model (the 4F2), had only 4 GB of RAM per node with 2 Gbps FC adapters.   Since then, IBM have released a succession of new models as Intel hardware has evolved, with the current nodes having at least 32 GB of RAM, dramatically more cores, and optional 16 Gbps FC adapters!

The neat thing is that clients who invested in licensing in 2003, have been able to upgrade their nodes, with data in place, over successive years.   The cost of new nodes has been relatively low compared to the performance and functional benefits that each release has provided.   So I know for a fact that this idea of an Evergreen storage product is not only possible, but positively demonstrated by IBM.

The challenge for any vendor trying to do this is three fold:

  1. The technology really has to support seamless upgrades.   While the IBM SVC certainly did and does, there were some minor hiccups along the way.   One example was that first model, the 4F2, could not support the later 64 bit firmware releases, which meant that if you held off upgrading for too long, upgrading to new hardware needed some special help or a double hop to get the upgrade going.    Another example is bad racking:   Racked and stacked badly, pulling one node out could result in a partner node being disturbed (something I sadly have seen).
  2. The vendor needs to remain committed to the product.   While I laud IBM’s success with the SVC (now going even stronger with its Storwize brothers),  a sister product released at the same time, the Storage File System (sometimes called Storage Tank), did not get market traction and did not progress very far before being replaced by GPFS (which was not exactly a one for one replacement).  And while the DS8000 continues going strong (long after Chuck Hollis, in a classic piece of EMC FUD,  declared it dead),  its little sister, the DS6800, truly was dead within months of being released.   Its early months were so drama laden (sometimes sadly referred to as a crit-sit in a box) that new models were never released, which was equally sad, as once the code stabilised it became a great product.
  3. The vendor needs to hang around.   This one seems fairly obvious.   Clearly if someone were to buy Pure Storage (if the structure of the company allowed someone to do this), they also need to support this strategy.

So can Pure Storage do it?   Only time will tell, but they have made a great start and the industry has shown the concept is possible.   I will watch their progress with great interest!


Posted in Uncategorized | 3 Comments

vSphere ESXi6.0 CBT (VADP) bug that affects incremental backups / snapshots.

VMware recently posted a new KB article 2136854 to advertise a new issue that has been found with their Changed Block Tracking (CBT) code.

It’s important to note that this is not the same one as posted recently also for ESXi 6.0 (KB 2114076) – now fixed in a re-issued build of ESXi 6.0 (Build 2715440)

But it is very similar to KB 2090639 from a historical perspective.

The Issue

If you are leveraging a product that uses VMware’s VADP for backup, then chances are you are leveraging this for not just initial fulls, but regular incremental snapshots (for backup purposes). There are numerous products on the market that leverage this API, it’s virtually the industry standard to use this feature as it results in faster backups.

When the incremental changes are being requested through the API (QueryDiskChangedAreas) the API is requested changed blocks, but unfortunately some of the changed blocks aren’t being correctly reported in the first place, so backup data is essentially missing. And backups based on this can be inconsistent when recovered and result in all sorts of problems.

The Challenge

Currently there is no resolution or hotfix to the issue from VMware. I hope that we will see something in the coming days due to the wide ranging impact to customers and partner products affected.

The Workarounds

The workarounds in the KB suggests:

  1. Do a full backup for each backup, and that will certainly work, but it’s not really a viable fix for most customers (ouch !)
  2. Downgrade to ESX 5.5 and virtual hardware back to 10 (ouch !)
  3. Shutdown the VM before doing an incremental  (ouch !)

From the testing we have done at Actifio, option 3 doesn’t actually provide a workaround either, and options 1 & 2 aren’t really ideal.

The Discovery

When Actifio Customer Success Engineers discovered the issue, we contacted VMware and proved the problem leveraging just API calls to demonstrate where the problem was. How did we discover the issue I hear you ask?  Well we managed to discover the issue via our patented fingerprinting feature that occurs post every backup job. This technique (feature) essentially has learnt to not trust the data we receive (history has proven this feature to be useful many times) but to go and verify it against our copy and the original source copy. If we receive a variance in any way, we trigger an immediate full read compare against the source and update our copy. This works like a Full Backup job, but doesn’t write out a complete copy again, it just updates our copy to line up with the source again (as we like to save disk where we can!). We’ve seen this occur from time to time with our many different capture techniques (not just VADP), so it’s a worthy bit of code to say the least that our customers benefit from.

Let’s hope there’s a hotfix on the near horizon, so the many VADP / CBT vendor products that rely on it, can get back to doing what we do best and that’s protecting critical data for our customers that can be recovered without question.


Thanks to Jeff O’Connor for writing this up.   You can find his blog here:  http://copydata.tips

Posted in Actifio, vmware | 2 Comments

Accessing the Instrumentation

Here are some rather bad photos of my 1972 Holden HQ Kingswood Premier, one of my first ever cars (and one that I sadly no longer own):

IMG_0001 IMG_0009

This was the V8 Four Litre model (actually 253 cubic inches, often jokingly described as having all the power of a 6 cylinder with the fuel economy of an 8).   The engine bay was so huge and empty I could open the bonnet and sit on the side of the car with my feet comfortably inside the bay while I changed spark plugs or cleaned the points.

The Kingswood was not what you would call an instrumented vehicle.   The dashboard had a speedo, a fuel gauge and three lights:   Temperature, Oil and Charging.   I dubbed these three lights the idiot lights: as once they came on,  you were the idiot.  (sorry, no picture; this was the 1980s).

Modern storage infrastructure by comparison is slightly more instrumented.   A vast array of metrics are tracked and these can be used to perform all sorts of analysis.   Analysis like:

  • Are my hosts getting good response times?
  • Are specific disks or arrays being over worked?
  • Are my fibre ports being used in a balanced fashion?

So can you do this with the Storwize products?    Of course!   I documented the built-in tools here (where I talked about the Performance GUI):


And here (where I talked about the performance CLI):


But these tools have only limited usefulness.   They are not granular, in that you cannot look at specific hosts or specific arrays or specific FC ports (meaning the three analysis ideas I suggested above are not even possible).   So how can we do this analysis?

The good news is that Storwize products do track all the metrics needed to do very granular analysis and these are freely accessible.   These files are documented by IBM, here is a fairly old page that documents some of them:


But how to turn these into something useful?   There used to be a tool called svcmon but this tool appears to have been killed as per this rather sad blog post:


There is another IBM Community developed tool called qperf which you can access using the link below:


With a graphing tool here:


And another tool here:  http://www.stor2rrd.com/

And yet another one here!   https://code.google.com/p/svc-perf/

The challenge for many of these tools is that they require manual setup, usually have a limited database engine and analysis is not always easy or simple.

You can of course use IBM’s TPC:


You could also consider Intellimagic.   Although I have not looked too deeply at this one, these guys wrote IBM’s Disk Magic tool, so they certainly understand storage performance


The challenge for all Storage Admins is that they are not always experts at diagnosing performance issues.    Getting some genuine examples of the thinking process and the flow of getting from problem to solution, is vital.   This makes  BVQ  another good choice.

To see an example of how instrumented data presented in a graphical format can be used to generate a useful problem analysis, check out this blog post here:


and another one here:


I really like these posts for two reasons:

  1. They clearly shows just how instrumented the product is
  2. They clearly show how using this data in a graphical format can lead to good and quick root cause analysis.

Also have a look at some of these videos:


So how are you instrumenting your Storwize?
What do you find the easiest tool to use?




Posted in Uncategorized | 5 Comments