Actifio Copy Data Forum – September 17 in Sydney

Its been a few weeks between posts, with the simple reason that I have been quite busy at Actifio!   One thing that is keeping me busy is an event we are running up in Sydney next week and it would be great to see you there.

You will hear from leading organizations like Westpac NZ, NSW Ambulance Service, and other Actifio customers, and learn how they have transformed their data management with Actifio Copy Data Virtualisation solutions.

At Copy Data Forum 2014, you will learn more about the proven business impact of Actifio, including:

Improved Resiliency – through instant data access for data protection and disaster recovery.

Enhanced Agility- putting data where you need it and when you need it.

Transition to the Cloud – ensuring your data follows your applications wherever they live, including public, private, and hybrid cloud-based systems.

Dramatic Savings – up to 90% reduction in storage costs, and up to 70% reduction in network bandwidth.

Sound interesting?     Register Now

Posted in Uncategorized | Leave a comment

Tortured by Tenders? Whats the problem?

Many Australian organizations, both Government and Private Enterprise acquire IT technology through a tender process.

No not that kind of tender….

Love Me Tenders

More like this kind of tender (anyone want a bridge shaped like a coat-hanger?).

tender-e-plan-5

The process of creating and responding to a tender actually involves what I call the three tortures:

  • The torture of creating the tender request document.   I have never met a client who enjoys the creation process.   Many resort to paying third parties to create them
  • The torture of responding to a tender.  I have never met a business partner or vendor who enjoys responding to one!
  • Then the final torture:  The torture of reading all those tender responses and selecting the winning one.   I have never personally experienced this torture but I can imagine how hard it must be reading all those vendor weasel words.

500px-Weasel_words.svg

I see five fundamental issues around tenders (well in Australia anyway).

1)  The lawyers are writing most of them (and it’s not helping one bit)

Saul

Most tender requests contain a huge amount of legal documentation.   Often less than 10% of all the words in the published documentation relates to technical or (more importantly) business requirements.
Quite seriously they often include 70-100 pages of legalese and 5-10 pages of truly useful back story as to why this tender has been released at all.
I am certain that every tender response needs to be stated inside a legal framework of responsibilities, but I have not seen any evidence that all of this legalese has prevented failed projects or bad solutions.

2)  Repetition in questions

I cannot over state how bad this situation is.   I have repeatedly seen tender documents that ask the same questions again and again and again (and again).

Even worse I see questions that are clearly not finished or questions that are missing huge amounts of obvious (and necessary) subtext or back story.  I have seen tenders where the quantity of question/answer documents created after the tender was published (as vendor questions are responded to) exceeded the quantity of technical detail provided in the initial documentation.  Quite frankly that’s just astonishing (in a bad way).

It seems that different teams each contribute to the total documentation and the person who compiles and publishes the document has no inkling just how much repetition has occurred in the process.   I don’t blame the authors – I blame the project manager who compiles their contributions and the timelines under which these tender documents are created.   Indeed my gut feel is that management simply don’t give the authors anywhere near enough time or resources to do a good job.

3)  Vendor bias

When you see a tender that asks for SRDF (as opposed to sync or async replication) you know there is a serious (EMC focused) bias.   Asking for Semi-Synch replication is nearly as bad (that marks it as a Netapp focused tender).
Many tenders are written with a specific outcome in mind, but all this leads to is weasel words, as all the other responders attempt to use their hammers to batter their products into the shape needed to answer the questions.

The issue is that the tender should really be about business outcomes enabled by IT, not IT solutions that someone thinks will lead to the best IT outcomes (and by implication, maybe, hopefully, the right business outcomes).

The idea of accepting that truly differentiated vendors will help you achieve better outcomes with differentiated technology simply doesn’t fit a straight question and answer response document.   The Q&A method only suits the accountants trying to score the responses.   But don’t worry all of that is handled next….

4)  No connection between technical requirements and financial reality

I have no issue with every organisation trying to get the best value for their money and the best possible outcomes for every major technical rework.
But if you want 2 Petabytes of Tier1 disk and your budget is $100K you are not going to get it.
Frankly most IT departments know full well what their maximum budget is, but if all but one tender response gets knocked back within 1 hour of submission because they all missed an unstated financial cut off, you have to question the efficacy of the whole process.   Invariably at least one vendor with the right contacts knows the ‘win price’. Everyone else was drifting off in the clouds.

Throw me a frickin bone

5)  May final gripe:   nowhere near enough lead time to get the responses written.   I routinely see tenders talked about for months and then released with less than 3 weeks to create the responses.  This tends to reflect the overall problem with timing…. everyone is simply too busy, but the end result is rushed bids.

So do you have a better perspective on what’s going on?  Feel free to share!

Posted in advice, Uncategorized | 3 Comments

Using AIX VG mirroring in combination with hardware snapshots

One of the great things about Logical Volume Managers is how you can use them for all manner of clever solutions.   I recently explored how to use a combination of hardware snapshots and LVM to create rapid backups without using backup software (or as a source for a data protection product).

To do this we need to do the following:

  1. We need to present a staging disk to the host, large enough to hold the data we are trying to protect.   In this example a volume group (VG) being used to hold DB2 data.  This disk could come from a different primary storage device (i.e. an XIV or a Storwize V7000) or could be an Actifio presented disk.   You need to check whether your multi-pathing software will work with that disk.
  2. We mirror our datavg onto our new staging disk using AIX VG mirroring.
  3. We take a hardware snapshot of that disk.
  4. We now allow the VG mirror to become stale to remove disk load on the host
  5. Prior to taking the next snapshot, we get the mirrors back in sync again.

This process clearly depends on whether you would prefer to leave the two copies in sync or let them go stale.   The advantage of letting them go stale is that the disk I/O workload needed to keep them in sync is avoided.  While you will need to catch-up later, the total effort to do this may well be significantly less than the continual effort of mirroring them.

Example configuration

We have a VG (called db2vg) with one copy.  We know only one copy exists because each logical volume in the volume group has only one PV.

[AIX_LPAR_5:root] / > lsvg -l db2vg
db2vg:
LV NAME             TYPE       LPs     PPs     PVs  LV STATE      MOUNT POINT
jfsdb2log1          jfs2log    1       1       1    open/syncd    N/A
jfsdb2log2          jfs2log    1       1       1    open/syncd    N/A
jfsdb2log3          jfs2log    1       1       1    open/syncd    N/A
db2binlv            jfs2       14      14      1    open/syncd    /db2
db2loglv            jfs2       10      10      1    open/syncd    /db2log
db2datalv           jfs2       40      40      1    open/syncd    /db2data

If I display the detailed view of the relevant VG I can see the VG is currently in a good state

[AIX_LPAR_5:root] / > lsvg -L db2vg
VOLUME GROUP:       db2vg                    VG IDENTIFIER: 00f771ac00004c0000000144bf115a1e
VG STATE:           active                   PP SIZE:        512 megabyte(s)
VG PERMISSION:      read/write               TOTAL PPs:      71 (36352 megabytes)
MAX LVs:            512                      FREE PPs:       4 (2048 megabytes)
LVs:                6                        USED PPs:       67 (34304 megabytes)
OPEN LVs:           6                        QUORUM:         2 (Enabled)
TOTAL PVs:          1                        VG DESCRIPTORS: 2
STALE PVs:          0                       STALE PPs:      0
ACTIVE PVs:         1                        AUTO ON:        yes
MAX PPs per VG:     130048
MAX PPs per PV:     1016                     MAX PVs:        128
LTG size (Dynamic): 512 kilobyte(s)          AUTO SYNC:      no
HOT SPARE:          no                       BB POLICY:      relocatable
PV RESTRICTION:     none                     INFINITE RETRY: no

We have added one new disk to the server.  We know it’s not in use because it has no VG (it says none).

[AIX_LPAR_5:root] / > lspv
hdisk0          00f771acd7988621                    None
hdisk5          00f771acbf1159f6                    db2vg           active
hdisk6          00f771ac41353d73                    rootvg          active
[AIX_LPAR_5:root] / > lsdev -Cc disk
hdisk0 Available C9-T1-01 MPIO IBM 2076 FC Disk
hdisk5 Available C9-T1-01 MPIO IBM 2076 FC Disk
hdisk6 Available C9-T1-01 MPIO IBM 2076 FC Disk

We extend the VG onto the new staging disk and then mirror it. We specify the VG name (db2vg) and the name of the unused or free disk (hdisk0).

It takes a while so we run the mirrorvg command as a background task with &

[AIX_LPAR_5:root] / > extendvg db2vg hdisk0
[AIX_LPAR_5:root] / > mirrorvg db2vg hdisk0 &
0516-1804 chvg: The quorum change takes effect immediately.

We monitor the mirroring with a script.  I did not write this script but did modify it.   The original author (W.M. Duszyk) should thus be acknowledged!   Also thanks to Chris Gibson for help with this.

#!/usr/bin/ksh93
### W.M. Duszyk, 3/2/12
### AVandewerdt 01/05/14
### show percentage of re-mirrored PPs in a volume group
 [[ $# < 1 ]] && { print "Usage: $0 vg_name"; exit 1; }
vg=$1
printf "Volume Group $vg has ";lsvg -L $vg | grep 'ACTIVE PVs:' | awk '{printf $3}';printf " copies "
Stale=`lsvg -L $vg | grep 'STALE PPs:' | awk '{print $6}'`
[[ $Stale = 0 ]] && { print "and is fully mirrored."; exit 2; }
Total=`lsvg -L $vg | grep 'TOTAL PPs:' | awk '{print $6}'`
PercDone=$(( 100 - $(( $(( Stale * 50.0 )) / $Total )) ))
echo "and is mirrored $PercDone%."
exit 0

We can use this script to check if the VG is in sync.   You run the script and specify the name of the VG:

 [AIX_LPAR_5:root] / >./checkvg.sh db2vg
Volume Group db2vg has 2 copies and is mirrored 85%.

We wait for it to reach 100%

 [AIX_LPAR_5:root] / > ./checkvg.sh db2vg
Volume group db2vg has 2 copies and is fully mirrored.

If you want to see the exact state of the VG, lets look at the volume group details.   Note how each LV now has 2 PPs and the LV state is open/syncd.  An LV state of closed/syncd is not an issue if the LV is actually raw (rather than using a file system)  and it is not being used by the application.

[AIX_LPAR_5:root] / > lsvg -l db2vg
db2vg:
LV NAME             TYPE       LPs     PPs     PVs  LV STATE      MOUNT POINT
jfsdb2log1          jfs2log    1       2       2    open/syncd    N/A
jfsdb2log2          jfs2log    1       2       2    open/syncd    N/A
jfsdb2log3          jfs2log    1       2       2    open/syncd    N/A
db2binlv            jfs2       14      28      2    open/syncd    /db2
db2loglv            jfs2       14      28      2    open/syncd    /db2log
db2datalv           jfs2       40      80      2    open/syncd    /db2data

Now display that LV.   We can see hdisk0 is copy 2 (PV 2).  This is good.

[AIX_LPAR_5:root] / > lslv -m db2binlv
db2binlv:/db2
LP    PP1  PV1               PP2  PV2               PP3  PV3
0001  0002 hdisk5            0002 hdisk0
0002  0003 hdisk5            0003 hdisk0
0003  0004 hdisk5            0004 hdisk0
0004  0005 hdisk5            0005 hdisk0
0005  0006 hdisk5            0006 hdisk0
0006  0007 hdisk5            0007 hdisk0
0007  0008 hdisk5            0008 hdisk0
0008  0009 hdisk5            0009 hdisk0
0009  0010 hdisk5            0010 hdisk0
0010  0011 hdisk5            0011 hdisk0
0011  0012 hdisk5            0012 hdisk0
0012  0013 hdisk5            0013 hdisk0
0013  0014 hdisk5            0014 hdisk0
0014  0015 hdisk5            0015 hdisk0

We are now ready to snapshot the staging disk to preserve its state as it is in the synced state.  Once the snapshot is created, we can let the mirror go stale so that there is no disk load to keep the staging disk in sync.  You should co-ordinate this snapshot with the application writing to the disk.   With Actifio we do this with the Actifio Connector software.

Once the snapshot is taken we can split the VG to stop the workload of mirroring.   We are going to split off copy 2, which is the copy that is on our staging disk (hdisk0).  So now we split off a copy:

splitvg -c2 db2vg

The new copy is called vg00.  You can force AIX to use a different name.

[AIX_LPAR_5:root] / > splitvg -c2 db2vg
[AIX_LPAR_5:root] / > lsvg
db2vg
rootvg
vg00

If we check db2vg we can see it still shows 2 PPs but actually we are no longer keeping the second copy (on hdisk0) in sync.

[AIX_LPAR_5:root] / > lsvg -l db2vg
db2vg:
LV NAME             TYPE       LPs     PPs     PVs  LV STATE      MOUNT POINT
jfsdb2log1          jfs2log    1       2       2    open/syncd    N/A
jfsdb2log2          jfs2log    1       2       2    open/syncd    N/A
jfsdb2log3          jfs2log    1       2       2    open/syncd    N/A
db2binlv            jfs2       14      28      2    open/syncd    /db2
db2loglv            jfs2       10      20      2    open/syncd    /db2log
db2datalv           jfs2       40      80      2    open/syncd    /db2data

When we look at our newly created VG (vg00) it does not have 2 copies.

[AIX_LPAR_5:root] / > lsvg -l vg00
vg00:
LV NAME             TYPE       LPs     PPs     PVs  LV STATE      MOUNT POINT
fsjfsdb2log1        jfs2log    1       1       1    closed/syncd  N/A
fsjfsdb2log2        jfs2log    1       1       1    closed/syncd  N/A
fsjfsdb2log3        jfs2log    1       1       1    closed/syncd  N/A
fsdb2binlv          jfs2       14      14      1    closed/syncd  /fs/db2
fsdb2loglv          jfs2       10      10      1    closed/syncd  /fs/db2log
fsdb2datalv         jfs2       40      40      1    closed/syncd  /fs/db2data

Curiously while we show as being in sync the sync actually is stale by 3 PPs already:

[AIX_LPAR_5:root] / > chmod 755 checkvg.sh;./checkvg.sh db2vg
Volume Group db2vg has 1 copies and is mirrored 99%.

I generate some change by copying some files to /db2data to increase this difference.   Of course if DB2 is really running then changes will start occurring straight away.

[AIX_LPAR_5:root] / > ./checkvg.sh db2vg
Volume Group db2vg has 1 copies and is mirrored 97%.

If we check the state of the LVs we can see that this file I/O has created stale partitions. This is not a problem.   The speed with which partitions become stale will depend on the size of the PPs and the address range locality of typical IOs generated between snapshots.

[AIX_LPAR_5:root] / > lsvg -l db2vg
db2vg:
LV NAME             TYPE       LPs     PPs     PVs  LV STATE      MOUNT POINT
jfsdb2log1          jfs2log    1       2       2    open/stale    N/A
jfsdb2log2          jfs2log    1       2       2    open/stale    N/A
jfsdb2log3          jfs2log    1       2       2    open/stale    N/A
db2binlv            jfs2       14      28      2    open/stale    /db2
db2loglv            jfs2       10      20      2    open/stale    /db2log
db2datalv           jfs2       40      80      2    open/stale    /db2data

When we are ready to take the next snapshot we need to get the two copies back together and in sync.   To do this we rejoin the two with this command:

joinvg db2vg

We can see the two start coming back to sync:

[AIX_LPAR_5:root] / > ./checkvg.sh db2vg
Volume Group db2vg has 2 copies and is mirrored 98%.

When the two get into sync we can clearly see this as the state is syncd rather than stale.

[AIX_LPAR_5:root] / > lsvg -l db2vg
 db2vg:
 LV NAME             TYPE       LPs     PPs     PVs  LV STATE      MOUNT POINT
 jfsdb2log1          jfs2log    1       2       2    open/syncd    N/A
 jfsdb2log2          jfs2log    1       2       2    open/syncd    N/A
 jfsdb2log3          jfs2log    1       2       2    open/syncd    N/A
 db2binlv            jfs2       14      28      2    open/syncd    /db2
 db2loglv            jfs2       10      20      2    open/syncd    /db2log
 db2datalv           jfs2       40      80      2    open/stale    /db2data

If the resync does not occur, we can force it with the syncvg command:

syncvg -v db2vg

Once we are in sync, we can do another snapshot of the staging disk.

Issues with scripting this:

One thing you may want to do is allow a non-root user to perform these commands.  So for instance if we want to allow the DB2 user (in this example db2inst2) to execute splitvg and joinvg commands we can use sudo to do this.

  1. Download and install sudo on the AIX host
  2. Issue this command to edit the sudo config file:   visudo
  3. Add this line:
    db2inst2 ALL = NOPASSWD: /usr/sbin/joinvg,/usr/sbin/splitvg

Log on as the DB2 user and check that it worked:

[AIX_LPAR_5:db2inst2] /home/db2inst2 > sudo -l User
db2inst2 may run the following commands on this host:
(root) NOPASSWD: /usr/sbin/joinvg
(root) NOPASSWD: /usr/sbin/splitvg

Using the snapshot with a backup host

One strategy that can be used in combination with this method is to present the snapshot to a server running backup software.  The advantage of doing this is that the backup can effectively be done off-host.   The disadvantage is that each backup will be a full backup unless the backup software can scan the disk for changed files or blocks.

Import the VG

To use the snapshot, connect to the management interface of the storage device that created the snapshot and map it to your backup host.   Then logon to the backup host and discover the disks:

cfgmgr

Learn the name of the hdisk

lspv
lsdev -Cc disk

Then import the volume group.  You need to use -f to force an import with only half the VG members present (since you are importing a snapshot of one half of a mirrored pair).  In this example we have discovered hdisk1 and are using it to import the VG db2vg.

importvg -y db2vg hdisk1 -f

Recreate the VG

If you are presenting the snapshot back to the same host that has the original VG, then we have to do two extra steps.   Because the snapshot has the same PVID as the staging disk you need to change the PVID and use the recreatevg command, not the importvg command.

In this example I have two VGs and two disks.

[aix_lpar_4:root] / > lspv
hdisk0          00f771acc8dfb10a                    actvg           active  
hdisk2          00f771accdcbafa8                    rootvg          active

I map the snapshot I created and run cfgmgr.   If you are sharp eyed you will spot I don’t have any PVID clashes.   Actually I don’t even have the original DB2 VG, but the method is still totally valid.

[aix_lpar_4:root] / > cfgmgr
[aix_lpar_4:root] / > lspv
hdisk0          00f771acc8dfb10a                    actvg           active  
hdisk2          00f771accdcbafa8                    rootvg          active  
hdisk4          00f771acd7988621                    None

We need to bring the VG online, so we clear the PVID

 [aix_lpar_4:root] / > chdev -l hdisk4 -a pv=clear
hdisk4 changed
[aix_lpar_4:root] / > lspv
hdisk0          00f771acc8dfb10a                    actvg           active  
hdisk2          00f771accdcbafa8                    rootvg          active  
hdisk4          none                                None

We now build a new VG using the VG name db2restorevg on hdisk4.

[aix_lpar_4:root] / > recreatevg -f -y db2restorevg hdisk4
db2restorevg
[aix_lpar_4:root] / > lsvg -l db2restorevg
db2restorevg:
LV NAME             TYPE       LPs     PPs     PVs  LV STATE      MOUNT POINT
fsjfsdb2log1        jfs2log    1       1       1    closed/syncd  N/A
fsjfsdb2log2        jfs2log    1       1       1    closed/syncd  N/A
fsjfsdb2log3        jfs2log    1       1       1    closed/syncd  N/A
fsdb2binlv          jfs2       14      14      1    closed/syncd  /fs/db2
fsdb2loglv          jfs2       14      14      1    closed/syncd  /fs/db2log
fsdb2datalv         jfs2       40      40      1    closed/syncd  /fs/db2data

Again if you are sharp eyed you will spot in the output above every LV has fs added to its name.  In other words db2binlv that was mounted on /db2 is recreated as fsdb2binlv mounted on /fs/db2.   This is done because the recreatevg command assumes you are creating this VG on a host that already has this VG.   So it renames constructs to prevent name clashes.   If for some reason you don’t want this renaming to occur, you can avoid it in the recreatevg command like this, where -L / and -Y NA forces the command to not rename any labels.   Use this with care.

recreatevg -f -L / -Y NA -y db2restorevg hdisk4 

Backups without backup software or file system scans.

If the staging disk is presented by Actifio, then Actifio will track every changed block and will only need to read the changed blocks to create a new backup image of the snapshot.   The VG PP size will play a role in determining the quantity of changed blocks.  This effectively allows backups without backup software since the Actifio Dedup engine can read blocks straight from snapshots created by Actifio.   This is a very neat trick.    Also since we presented the staging disk from the Actifio snapshot pool, we now also have a copy that we can present at will for instant test and dev or analytics purposes.

Scripting for Application Consistency

When creating the snapshot, you ideally want the whole process to be orchestrated where a regular update job is run according to a schedule.  The process should get the VG mirror back into sync, get the application into a consistent state (such as hot backup mode), create a snapshot and then let the VG mirror go stale again.

The Actifio Connector can be used to coordinate application consistency.  Clearly if your staging disk is coming from a different storage product then you will need to use that vendors method.   Every time Actifio starts a snapshot job (which can be automated by the Actifio SLA scheduling engine) it can call the Actifio Connector installed on the host to help orchestrate the snapshot.  It does so in phases:  init; thaw; freeze; fini and if necessary abort.   We set the database name and path and VGname at the start of the script.   The init phase re-syncs the VG; the thaw phase puts DB2 into hot backup mode; the freeze phase takes DB2 out of hot backup mode; the fini phase splits the VGs.

#!/bin/sh
DBPATH=/home/db2inst2/sqllib/bin
DBNAME=demodb
VGNAME=db2vg
if [ $1 = "freeze" ];then
 $DBPATH/db2 connect to $DBNAME
 $DBPATH/db2 set write suspend for database
 exit 0
fi
if [ $1 = "thaw" ];then
 $DBPATH/db2 connect to $DBNAME
 $DBPATH/db2 set write resume for database
 exit 0
fi
if [ $1 = "init" ];then
 sudo joinvg $VGNAME
 while true
 do
 synccheck=$(/act/scripts/checkvg.sh $VGNAME)
 if [ "$synccheck" != "Volume Group $VGNAME has 2 copies and is fully mirrored." ]
 then
 echo $synccheck
 sleep 30
 else
 break
 fi
 done
 exit 0
fi
if [ $1 = "fini" ];then
 echo "Splitting $VGNAME"
 sudo splitvg -c2 $VGNAME
 exit 0
fi
if [ $1 = "abort" ];then
 $DBPATH/db2 connect to $DBNAME
 $DBPATH/db2 set write resume for database
 exit 0
fi

Hopefully this whole process is helpful whether you use Actifio or not.  Here is a small set of references which helped me with this:

Waldemar Mark Duszyk Blog

Chris Gibsons Blog

IBM Technote

Posted in Actifio, AIX, IBM | Tagged , | 2 Comments

Knowledge sharing in IT

Hopefully most of you know what Ted Talks are?     A truly marvelous collection of inspiring videos, usually 20 minutes or less that are nearly always worth watching.

I recently watched this one from Stanley McChrystal, former US Army General who gave me a unique view on information sharing. General McChrystal says something at one point that strikes me as phenomenally appropriate to IT (even though he was talking about military secrets).  He says:

“… as we passed that information around, suddenly you find that information is only of value if you give it to people who have the ability to do something with it. The fact that I know something has zero value if I’m not the person who can actually make something better because of it.”

 

So how does this apply to IT?

I have worked with support personnel who kept all of their secret commands in notepads that they kept concealed in their back pockets.  Luckily in many cases the UNIX history command let me learn all their secret incantations as soon as they were out of the room. I did work with one guy in remote support who would create a file with VI, populate it with his commands of power, make it executable and run it.   He left nothing in the UNIX history but VI, chmod 755 and the name of his secret file.   He was simultaneously a smart guy and a smart alec.

my-secret-diary-love-do-not-enter-131541109439

I have learnt that the motivation to keep commands secret often does not spring out of any misguided belief that they are keeping dangerous commands away from inexperienced people.  They are simply trying to make themselves indispensable.   Sadly in the meantime everyone else is left to reinvent the wheel, or wait for the right person to come online, plus relearn things that others already knew and repeat mistakes that others have already made.

This leads me to knowledge sharing.

There are three forms of knowledge sharing in the IT industry:

  1. Knowledge that vendors share with users
  2. Knowledge that users share with users
  3. Knowledge that users share with vendors

Now you may think that vendors sharing data with users is obvious, but three things stand in your way:

  • Portal walls.   Vendors who guard their knowledge bases with portal walls are protecting their intellectual property from free loaders and their inquisitive competition, while simultaneously forcing everyone to rely on their willingness to actually share and denying us googleability.
  • Poor sharing practices, such as readme documents that are vague or incomplete (or even non-existent).
  • Fear of other vendors marketing departments.  This fear drives IT companies to not share information, not out of fear of what their users will say, but out of fear of what their competitors will use that information for.

The good news is that each and every one of us has information we can share with each other.   Whether we do this in blogs, social media platforms (like forums or twitter) or just in hand written notes stuck up on the notice board.  It is in everybody’s hands to share what they know.  Find a forum, start a blog, send out emails.   Just do it.

And if your IT vendors are worth their salt, they will listen in.

And always remember:

…the fact that I know something has zero value if I’m not the person who can actually make something better because of it.

 

 

Posted in Uncategorized | 2 Comments

Don’t always default to default

I once sat in a project meeting in which the Project Manager declared that:

Default settings are always the best settings, since they were the ones the vendor made default!

While you may think there is some logic in the statement, it is a flawed belief.

While it’s true that in many cases the default settings may cover the most common implementations, there is no guarantee of safety in leaving everything at defaults.
Equally there is great danger with monkeying with all the bells and whistles if you are not sure what they will do!

A classic example I keep seeing is AIX Fibre Channel HBA settings, in particular for error recovery and dynamic tracking.

AIX was in existence a long time before Fibre Channel came into common use.   I/O in those days normally travelled down a single path to a single device, or via a common SCSI cable, off which hung multiple devices like very large hard drives (well physically large but logically small).  So if there was a glitch on the link, it was better to wait awhile for the link to come back than to declare the link dead, since there was no other way to get to those devices.

However once multipath Fibre Channel became common, it made sense to allow more control over this behaviour.

AIX has two settings that affect how link failures are handled (caused by an HBA failure, switch port failure, cable failure, someone disconnecting the wrong cable, etc).
Fast Failure of an I/O path is controlled by a fscsi device attribute called fc_err_recov.
The default setting for this attribute is delayed_fail (which I call slow failure).   You can instead set it to fast failure. This setting influences what happens when the adapter driver receives a message from the fibre channel switch that there is a link event.

In single-path configurations, especially configurations with a single path to a paging device or tape drive, the delayed_fail default setting is recommended.
So paths to tape drives or to paging devices should use delayed_fail, while paths to everything else should use fast_fail.
With AIX regardless of what multi-pathing software is in use, if a path fails, there will most likely be a pause in I/O processing  What happens is at the time of the path failure is that some I/O has already been issued to the ‘bad’ path. After 15 seconds the path is failed and that I/O is resent down a different path.   With delayed fail, this pause can be as long as 40 seconds.

What should you look for?   This is the default (normally less ideal) situation:

  # lsattr -E -l fscsi0 
attach        switch       How this adapter is CONNECTED         False
dyntrk        no           Dynamic Tracking of FC Devices        True
fc_err_recov  delayed_fail FC Fabric Event Error RECOVERY Policy True
scsi_id       0x630f00     Adapter SCSI ID                       False
sw_fc_class   3            FC Class for Fabric                   True

These are my recommended settings:

# lsattr -El fscsi0 
attach        switch    How this adapter is CONNECTED         False
dyntrk        yes       Dynamic Tracking of FC Devices        True
fc_err_recov  fast_fail FC Fabric Event Error RECOVERY Policy True
scsi_id       0x630f00  Adapter SCSI ID                       False
sw_fc_class   3         FC Class for Fabric                   True

The ‘True’ at the end of the line means the value can be changed, but not necessarily while the device is in use.
So if you find you are not running using the ideal settings and it makes sense to change them, run these two commands against each relevant fscsi device and then reboot at your leisure since it will only change the ODM (unless you can unmount affected file systems and vary off affected VGs).

chdev -l fscsi0 -a dyntrk=yes -P
chdev -l fscsi0 -a fc_err_recov=fast_fail -P

Note we are also setting dynamic tracking to yes.  This setting allows AIX to learn that the fibre channel port ID of a device has changed on the fly.   This is handy if you need to move a cable to a different port or switch (where you are zoning by WWPN and you have a need to reconfigure on the fly).

The readme for AIX 5.2 (which applies equally to higher versions) explains all of this behaviour here:

http://www-1.ibm.com/support/docview.wss?uid=isg1520readmefb4520desr_lpp_bos

Posted in advice | Tagged , | 1 Comment

Thin Provisioning Buyers Guide

Storage space consumption is always a major bone of contention in all data centers.    It seems 100 TB of new storage can fill up in a blink of an eye and then you have to buy some more.    But what to do?   Lets get below the covers to see what is happening.

When data is written to a volume (I am tempted to say disk, but since most disks are really virtual volumes, that may not actually write to a spinning disk, I will stick with volume) it is written by a file system (or disk space manager of some kind like ASM), to logical block addresses or LBAs (that are 512 bytes in size).    Space in a volume is addressed in LBAs starting at zero and going to the highest address that the disk size allows (so clearly a 5 TB volume has way more LBAs than a 5 GB volume).

From the host servers perspective, if a volume claims it has 5 TB of space available, then the server believes it has the right to write 5 TB.    It is quite common for storage controllers to allow storage administrators to over-allocate space.   Meaning that for a fixed quantity of physical capacity (say 75 TB) you could allocate 150 TB of volumes.    This is over allocation and is only made possible by thin provisioning sometimes combined with other space-saving methods (like compression and deduplication).  Normally over-allocation occurs by creating over-allocated storage pools.

An over-allocated storage pool means the administrator can create virtual volumes whose total volume size (when summed together), exceeds the available storage capacity of that pool.  In other words we can advertise more space than we actually have.   This means the volumes in the pool had better be space efficient in nature.

Now genuine space efficient volume design should follow five principles:

  1. When data gets written to the volume, allocate as little space from the pool as possible to hold that data.  In other words if I write 100KB to a volume, don’t allocate 100GB from the pool to that volume to hold that data.
  2. When zeros get written to the volume, allocate no space from the pool and preferably release the space occupied by those LBAs back to the pool.  In other words, if I write 1 MB of zeros, don’t allocate 1 MB of pool space to hold those zeros.  In fact, have a look at the LBAs I am writing to and if they include address ranges already allocated to the volume from the pool, see if we could de-allocate them from the volume and return that space to the pool.
  3. When allocated space is no longer needed, offer some way to release that space back to the pool (sounds like # 2 but is actually different).  In other words, if I delete a 1 GB file, then that’s really 1 GB of volume space I don’t need anymore.   The file system knows this, but does the underlying disk controller?
  4. If space is running short in the pool, give me plenty of warning so I can do something about it before everything goes wrong.
  5. If data is now being written in a thin fashion, then it is likely the data is not being written sequentially.  When combined with other space-saving technologies this should ideally not create performance issues.

So how well does your storage system do in this regard?    Over the next few posts I will explore these categories in greater depth.   If you have any other characteristics I have missed, happy to add them.

Posted in advice, Uncategorized | Tagged , , , | 2 Comments

Don’t look back in anger

It seems fairly obvious that as you get older you have more and more memories to look back on.  Some of these memories are happy….   some less so.    But seen through the golden haze of nostalgia many things that happened in the past start to become far more glorious than they really were.

I was born and grew up in Perth, so my childhood memories are all from that city.   Recently I found a Facebook page called Lost Perth, clearly run by someone who is close to my age, as the photos being posted really appeal to my sense of nostalgia.   Recently they posted a photo of Perth International Airport as it was back in the 1960s.    I can remember being in this very hall and it was a place of wonders.   When people came from far far away.   It seemed so amazing to my childs mind.

Old Perth Airport

Someone then immediately posted another photo of the same place.   Can you spot the problem?

Packed Airport

Why didn’t my childhood memories contain images of the arrivals hall as an arrivals hell? Maybe I didn’t want to remember it that way?

It’s a bit like your memories of life at former employers.   You can leave a company in anger, blaming terrible management or misguided market dominance plans or crazed short-term thinking…. but it won’t do you any good.   Choose instead to remember the golden years…  and don’t look back in anger.

Posted in Uncategorized | 2 Comments