Bits are cheap! Don’t sell yourself short on key length

Using SSH keys to perform password-free login is quite common in Unix hosts and in  Appliances that have embedded Unix (like Storwize products).

You effectively have a public key which is shared  and a private key (usually with a PPK extension) that is not shared.  Think of the public key like the lock in your front door, that  everyone can see.   Think of the private key like the door key in your pocket or hand-bag.   If you keep your private key secure, your door is relatively secure.  If you lose your keys, your door is most likely no longer secure (unless they are down the back of the couch).

Sticking with the door analogy, the risk with a door lock is that someone could still just try to kick your door in (brute force attack) or pick your lock.   The bit length of the key can make this harder to achieve: the longer the bit length the harder it is to crack.

It is not unusual to see instructions that suggest you use a command like this to generate keys, where a bit length of 1024 is specified with an RSA key:

ssh-keygen –b 1024 -t rsa -f ~/.ssh/id_rsa

Of if using PuTTYgen to create the keys, to see instructions like this:

  1. Start PuTTYgen by clicking Start > Programs > PuTTY > PuTTYgen. The PuTTY Key Generator panel is displayed.
  2. Click SSH-2 RSA as the type of key to generate.
    Note: Leave the number of bits in a generated key value at 1024.

The problem is that these instructions are all old.  In fact using the ssk-keygen command syntax example shown above would represent a down-grade in what is now the default setting.   The wiki and man pages for ssh-keygen both confirm that for RSA, the default length is now 2048 bits (not 1024 bits).

To confirm what key length you get by default, simply make a test key and then read it back.  In this example I create a new public/private key pair called testkey  without specifying a bit length (there is no -b 1024):

anthony$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/Users/anthony/.ssh/id_rsa): testkey
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in testkey.
Your public key has been saved in
The key fingerprint is:
SHA256:or3Yhykd0W569QcHtGk4ZMSdQDYlaM9ko+TiWQZ7pp4 anthony@Anthonys-Actifio-MacBook-Pro.local
The key's randomart image is:
+---[RSA 2048]----+
| =B+.. |
| . +.Bo+ |
| B O + o |
| + O = = |
| ..@S o . |
| o=.o . . . |
| .o.B . . o |
| .oE.o . . |
| ..oo . |

I then read the file back using the -l and -f params (specifying the name of the file) and confirm the bit length, which in this case is 2048 bits as highlighted by the red text:

anthony$ ssh-keygen -l -f testkey
2048 SHA256:or3Yhykd0W569QcHtGk4ZMSdQDYlaM9ko+TiWQZ7pp4 anthony@Anthonys-Actifio-MacBook-Pro.local (RSA)

When using PuTTYGen, if you use a recent version you will note that the default bit length is now 2048 (as indicated by the red circle).   If you load a key you should see the bit length of the loaded key as indicated by the orange circle.


So if you see instructions specifying the creation of a 1024 bit key, I suggest you ignore them and use 2048 bits or at the least question this with your vendor.   Equally if you are using older keys, it is well worth checking their bit length and generating new keys, since this will give you the now default bit length of 2048, but also renew them, reducing the risk of someone using an older (and potentially leaked) key inappropriately.




Posted in Uncategorized | Leave a comment

Triple 000 app – a recommended app for all your smart devices

I want to draw my Australian friends to an app called “Emergency +”, available for your smart device (Apple, Android and Windows).

The scenario is simple.  You see something terrible:  a fire; a car crash; a natural disaster.  The standard response is simple:   You should dial 000 (the 911 equivalent in Australia).

One of the first things you are asked is usually:

“What is your location?  What are you?”

Now that’s easy if you are at home….  but what if you are on the road, or at a store,  or walking the dog?

The idea is to eliminate confusion over your location.

First you open the App and see this:


Select the map and determine your location:

File_000 (1)

Then dial 000 using the App (you will get a pop-up like this):



It will start a phone call, at which point you should switch to  speaker mode (hands free) and jump back to the app.  You now have your address and your exact location (to a number of meters) for you to share with the responder on the phone.

Details of the app are here:

Look in your smart device app store for an app with this icon:


I urge you to install this app and also encourage your friends to do so too.
Sit down with your family and install it on everyone’s phone.  Do it tonight.

It might save someones life.

Posted in Uncategorized | Leave a comment

Exact MSP Space Accounting on a Storwize Pool

I have blogged in the past about the classic IT Story, The Cuckoo’s Egg by Clifford Stoll.   A true story that details how Clifford discovered a hacker while trying to account for 9 seconds of mainframe processing time.

I was reminded of this recently while doing an MSP Space Accounting project.  MSPs (Managed Service Providers) are understandably cost focused as they try to compete with low-cost IAAS (Infrastructure As A Service) providers like Amazon.   To control costs, shared resources are normally employed as well as thin-provisioning and its cousin over-provisioning (don’t confuse them,  thin-provisioning just means using only the exact resources needed for an objective, where over-provisioning means promising or committing to more resources than you actually have, in the hope that no one calls your bluff.   You can always use thin-provisioning without using over-provisioning).

A Storwize pool can use both thin and over-provisioning.   As an MSP if you are looking at pool usage you may want to be clear exactly how much space each client in the shared pool is using.   Now I don’t want to burn time explaining the exact workings of thin provisioning (something that Andrew Martin explains very well here), but I wanted to point out a quirk that may confuse you while trying to do space accounting.

In this example I have a Storwize pool that is 32.55 TiB in size and is showing 22.93 TiB Used.  You can clearly see we have over-allocated the 32.55 TiB of disk space by having created 75.50 TiB of virtual volumes!


Now this is significant because if I wanted to do space accounting I would expect the Used capacity of all volumes in the pool to sum up  22.93 TiB of space.  In other words if five end clients are sharing this space and I know which volumes relate to which client, I would expect the sum total of all volumes used by all clients to equal 22.93 TiB.

If I bring up the properties panel for the pool I can clearly see metrics for the pool including the extent size (in this example 2.00 GiB, remember that, it is significant later).


Now for each thin provisioned volume I get three size properties:

Used: 768.00 KiB   
Real: 1.02 GiB   
Total: 100.00 GiB  

To explain what these are:

  • Used capacity is effectively how much data has been written to the volume (which includes the B-Tree to track thin space allocation).
  • Real capacity is how much space in grains has been pre-allocated to the volume from extents allocated from the pool.
  • Total capacity is the size advertised to the hosts that can access this volume.

This means I could sum either Used capacity or Real capacity.   Since Real capacity is always larger than Used capacity, it makes more sense to sum this.  Especially if this is the number I am using to determine usage by clients inside a shared pool.

To get the used space size of all volumes we need to differentiate between fully provisioned (Generic) volumes and Thin-Provisioned volumes.

This command will grab all the Generic volumes in a specific pool (in this example called InternalPool1):

lsvdisk -bytes -delim ,  -filtervalue se_copy_count=0:mdisk_grp_name=InternalPool1

This command will grab all the thin volumes in a specific pool (in this example called InternalPool1):

lssevdiskcopy -bytes -delim , -nohdr -filtervalue mdisk_grp_name=InternalPool1

Add the -nohdr option if you wish to use these in a script.

So for the generic volumes we can sum the capacity field.   In this example pool, I used a spreadsheet and found it sums to 19,404,662,243,328 byes

So for the thin volumes we can sum the real capacity field.   In this example pool,  I used a spreadsheet and found it sums to 5,260,831,053,824 bytes.

This brings us to a combined total of 24,665,493,297,152 bytes which is 22.43 TiB.

The problem here is obvious.   I expected to account for 22.93 TiB of space, but summing the combined total of actual capacity for full-fat volumes and real-capacity for thin volumes doesn’t add up to what I expect.  In fact in this example I am short by around 0.5 TiB of used capacity.  How do I allocate this space to a specific client if no volume owns up to using it?

I can actually spot this in the CLI as well using just the lsmdiskgrp command.  If I subtract real capacity 24,665,493,297,152 from total capacity 35,787,814,993,920 I get 11,122,321,696,768 bytes, which is nowhere near reported free capacity of  10,578,504,450,048 bytes.  This again reveals 543,817,246,720 bytes (0.494 TiB) of allocated space that is not showing against volumes.

IBM_Storwize:Actifio1:anthonyv>lsmdiskgrp -bytes 0
 id 0
 name InternalPool1
 status online
 mdisk_count 1
 vdisk_count 525
 capacity 35787814993920
 extent_size 2048
 free_capacity 10578504450048
 virtual_capacity 83010980413440
 used_capacity 23916077907968
 real_capacity 24665493297152

The answer is that the space is actually allocated to volumes, but is not being accounted for at a volume level.   If you scroll up to the second screen shot showing the Pool overview you can see the Extent Size is 2 GiB.   That means the minimum amount of space that gets  allocated to a volume is actually 2 GiB.  But if we look at the volume properties of a single volume, there is no indication that this volume is actually holding down 2 GiB of pool space.     In this example I can see only 1.02 GiB of space being claimed.  So for this example volume there is actually 0.98 GiB of space allocated to the volume which is not actually being acknowledged as being dedicated to that volume.


So how do I cleanly allocate this 0.5 TiB?

I see two choices.   The first is to simply determine the shortfall, divide it by the number of thin allocated volumes and then add that usage to each thin volume.     In this example I have 519 thin volumes, so if I divide  543,817,246,720 by 519 thats pretty well 1 GiB per volume I could simply add to that volume’s space allocation.

The second is to accept it as a space tax and simply plan for it.   The issue is far less pronounced if the volume quantity is small and the volume size is large.  The issue is also far less pronounced with smaller extent sizes.   At very small extent sizes it in fact will most likely not occur at all or be truly trivial in size (like Clifford’s 9 seconds). In this example simply using 1 GiB extents would have pretty well masked the issue.    But remember that the smaller your extent size, the smaller your maximum cluster size can be.  A 2 GiB extent size means the maximum cluster size is 8 PiB.



Posted in Uncategorized | Leave a comment

Mapping Linux RDMs to Storwize Volumes

As a follow-up to my previous post about MPIO software and RDMs, I suggested SDDDSM could help you map Windows volumes to Storwize volumes.    This led to the obvious question:   What about Linux VMs?

In a distant time there was a version of IBM SDD for Linux (in fact you can still download it).  But because it was closed source and used compiled binaries, it meant that users could only use specific Linux distributions/Kernel versions.    This was rather painful (especially if you upgraded your Linux version due to some other bug and then found SDD no longer worked).    Fortunately native Multipathing for Linux rapidly matured and offered a simple and native option that is definitely the way to go (and please don’t listen to the vendors pushing proprietary MPIO software, integration native to the Operating System using vendor plug-ins is in my opinion  the only acceptable MPIO solution).

Either way, it turns out you don’t even need multi path software to map a Storwize Volume to an Operating System device.

In this example I have created a volume on a Storwize V3700 with a UID then ends in 0043.


It is mapped as a pRDM to a VM, I can see the same UID under the Manage Paths window.  You can see the same UID at the top of the window (ending with 0043).


On the Linux VM that is using this VM, I want to confirm if the device /dev/sdb matches the pRDM.   In this example we use the smartctl command.   We can clearly see the matching Logical Unit ID  (ending in 0043), so we know that /dev/sdb is indeed our pRDM.

[root@demo-oracle-4 ~]# smartctl -a /dev/sdb
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-573.3.1.el6.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen,

Vendor: IBM 
Product: 2145 
Revision: 0000
User Capacity: 5,368,709,120 bytes [5.36 GB]
Logical block size: 512 bytes
Logical Unit id: 0x60050763008083020000000000000043
Serial number: 00c02020c080XX00
Device type: disk
Transport protocol: Fibre channel (FCP-2)
Local Time is: Sat Apr 16 23:16:09 2016 EDT
Device does not support SMART

Error Counter logging not supported
Device does not support Self Test logging
[root@demo-oracle-4 ~]#

If you find smartctl is not installed, then install the smartmontools package:

yum install smartmontools

If we have Linux multipath configured, we can also use the multi path -l (or -ll) command to find the UID and determine which Storwize Volume is which Linux device.  Again I can easily spot that mpathb (sdb) is my Storwize volume with the UID ending in 0043.

[root@centos65 ~]# multipath -ll
mpathb(360050763008083020000000000000043) dm-6 IBM,2145
size=5G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=50 status=active
 `- 5:0:1:0 sdb 8:96 active ready running

So Linux users will actually find it quite easy to map OS disks back to the Storwize volume.


Posted in IBM, IBM Storage, Storwize V3700, Storwize V7000 | Tagged , , | Leave a comment

Do RDMs need MPIO?

I got a great question the other day regarding VMware Raw Device Mappings:

If an RDM is a direct pass though of a volume from Storage Device to VM, does the VM need MPIO software like a physical machine does?

The short answer is NO,  it doesn’t.  But I thought I would show why this is so, and in fact why adding MPIO software may help.

First up, to test this, I created two volumes on my Storwize V3700.


I mapped them to an ESXi server as LUN ID 2 and LUN ID 3.  Note the serials of the volumes end in 0040 and 0041:


On ESX I did a Rescan All and discovered two new volumes, which we know match the two I just made on my V3700, as the serial numbers end in 40 and 41 and the LUN IDs are 2 and 3:


I confirmed that the new devices had multiple paths, in this example only two (one to each Node Cannister in the Storwize V3700):


I then mapped them to a VM as RDMs, the first one as a Virtual RDM (vRDM), the second as a Physical (pRDM):


Finally on the Windows VM I Scanned for New Devices and brought up  the properties of the two new disks.   Firstly you note that the first disk (Disk 1) is a VMware Virtual disk while the second disk (Disk 2) is an IBM 2145 Multi-Path disk.   This is because the first one was mapped as a vRDM, while the second was mapped as a pRDM.


So here is the question, if the Physical RDM is a multi-path device, does it have one path or many?      The first hint is that we only got one disk for each RDM.  But what do I see if I actually install MPIO software?    So I installed SDDDSM and displayed path status using the datapath query device command

C:\Program Files\IBM\SDDDSM>datapath query device

Total Devices : 1

SERIAL: 60050763008083020000000000000040
Path#    Adapter/Hard Disk          State  Mode    Select Errors
    0  Scsi Port2 Bus0/Disk2 Part0  OPEN   NORMAL      86      0

C:\Program Files\IBM\SDDDSM>

What the output above shows is that there is only one path being presented to the VM, even though we know the ESXi HyperVisor can see two paths.

So this proves we didn’t actually need to install SDDDSM to manage pathing, as there is only one path being presented to the disk (the HyperVisor is handling the multiple paths using its own MPIO capability VMW-SATP-ALUA, which we can see in the ESXi pathing screen capture further up above.

Having said all that, there is one advantage from the Windows VM perspective to have SDDDSM installed, which is that I can see that Disk2 maps to the V3700 volume with a serial that ends in 40 (rather than 41).   So If I wanted to remove the vRDM volume (Disk 1) I know with safety that the volume ending in ’41’ is the correct one to target.


Posted in IBM Storage, Storwize V3700, Storwize V7000, Uncategorized, vmware | Tagged , | 6 Comments

Evergreen Storage? Can it actually work?

Pure Storage is one of several hot flash vendors in the market right now.   Despite some negativity about their recent IPO, it actually shows that the market thinks they have got their product and execution right.

One challenge for every Flash vendor out there (and there are quite a few) is to be able to explain the why.   Why my product and not another vendors?

One thing Pure Storage promote as a strong ‘why us‘  is their concept of Evergreen Storage, described here:

Fundamentally they are saying that as technology evolves, their modular physical design and stateless software design will allow you to upgrade components without having to move data or do any of these forklift upgrades.  Here is an image from their brochure:


Even with Storage vMotion, the need to move data between storage arrays remains a major additional cost of replacing or upgrading storage hardware, and the ability to minimise or eliminate this work is definitely a huge plus.

But can they actually do it?  Do we have working examples of other vendors achieving this?

There is actually a good working model of a product that has done exactly this since 2003: The IBM SAN Volume Controller.     When IBM released the SVC in 2003, the first model (the 4F2), had only 4 GB of RAM per node with 2 Gbps FC adapters.   Since then, IBM have released a succession of new models as Intel hardware has evolved, with the current nodes having at least 32 GB of RAM, dramatically more cores, and optional 16 Gbps FC adapters!

The neat thing is that clients who invested in licensing in 2003, have been able to upgrade their nodes, with data in place, over successive years.   The cost of new nodes has been relatively low compared to the performance and functional benefits that each release has provided.   So I know for a fact that this idea of an Evergreen storage product is not only possible, but positively demonstrated by IBM.

The challenge for any vendor trying to do this is three fold:

  1. The technology really has to support seamless upgrades.   While the IBM SVC certainly did and does, there were some minor hiccups along the way.   One example was that first model, the 4F2, could not support the later 64 bit firmware releases, which meant that if you held off upgrading for too long, upgrading to new hardware needed some special help or a double hop to get the upgrade going.    Another example is bad racking:   Racked and stacked badly, pulling one node out could result in a partner node being disturbed (something I sadly have seen).
  2. The vendor needs to remain committed to the product.   While I laud IBM’s success with the SVC (now going even stronger with its Storwize brothers),  a sister product released at the same time, the Storage File System (sometimes called Storage Tank), did not get market traction and did not progress very far before being replaced by GPFS (which was not exactly a one for one replacement).  And while the DS8000 continues going strong (long after Chuck Hollis, in a classic piece of EMC FUD,  declared it dead),  its little sister, the DS6800, truly was dead within months of being released.   Its early months were so drama laden (sometimes sadly referred to as a crit-sit in a box) that new models were never released, which was equally sad, as once the code stabilised it became a great product.
  3. The vendor needs to hang around.   This one seems fairly obvious.   Clearly if someone were to buy Pure Storage (if the structure of the company allowed someone to do this), they also need to support this strategy.

So can Pure Storage do it?   Only time will tell, but they have made a great start and the industry has shown the concept is possible.   I will watch their progress with great interest!


Posted in Uncategorized | 3 Comments

vSphere ESXi6.0 CBT (VADP) bug that affects incremental backups / snapshots.

VMware recently posted a new KB article 2136854 to advertise a new issue that has been found with their Changed Block Tracking (CBT) code.

It’s important to note that this is not the same one as posted recently also for ESXi 6.0 (KB 2114076) – now fixed in a re-issued build of ESXi 6.0 (Build 2715440)

But it is very similar to KB 2090639 from a historical perspective.

The Issue

If you are leveraging a product that uses VMware’s VADP for backup, then chances are you are leveraging this for not just initial fulls, but regular incremental snapshots (for backup purposes). There are numerous products on the market that leverage this API, it’s virtually the industry standard to use this feature as it results in faster backups.

When the incremental changes are being requested through the API (QueryDiskChangedAreas) the API is requested changed blocks, but unfortunately some of the changed blocks aren’t being correctly reported in the first place, so backup data is essentially missing. And backups based on this can be inconsistent when recovered and result in all sorts of problems.

The Challenge

Currently there is no resolution or hotfix to the issue from VMware. I hope that we will see something in the coming days due to the wide ranging impact to customers and partner products affected.

The Workarounds

The workarounds in the KB suggests:

  1. Do a full backup for each backup, and that will certainly work, but it’s not really a viable fix for most customers (ouch !)
  2. Downgrade to ESX 5.5 and virtual hardware back to 10 (ouch !)
  3. Shutdown the VM before doing an incremental  (ouch !)

From the testing we have done at Actifio, option 3 doesn’t actually provide a workaround either, and options 1 & 2 aren’t really ideal.

The Discovery

When Actifio Customer Success Engineers discovered the issue, we contacted VMware and proved the problem leveraging just API calls to demonstrate where the problem was. How did we discover the issue I hear you ask?  Well we managed to discover the issue via our patented fingerprinting feature that occurs post every backup job. This technique (feature) essentially has learnt to not trust the data we receive (history has proven this feature to be useful many times) but to go and verify it against our copy and the original source copy. If we receive a variance in any way, we trigger an immediate full read compare against the source and update our copy. This works like a Full Backup job, but doesn’t write out a complete copy again, it just updates our copy to line up with the source again (as we like to save disk where we can!). We’ve seen this occur from time to time with our many different capture techniques (not just VADP), so it’s a worthy bit of code to say the least that our customers benefit from.

Let’s hope there’s a hotfix on the near horizon, so the many VADP / CBT vendor products that rely on it, can get back to doing what we do best and that’s protecting critical data for our customers that can be recovered without question.


Thanks to Jeff O’Connor for writing this up.   You can find his blog here:

Posted in Actifio, vmware | 2 Comments