Why ALUA is a very cool acronym

For way more years than I want to accept, most operating systems were clueless about Fibre Channel networks.   Totally clueless.

Of course I am kinda talking ancient history… plus my expectations of now distant operating systems like Windows NT were not that high.  But I was still shocked the first time I presented a SAN LUN that had 4 paths….   and Windows NT declared it had found four disks rather than one (and then tried to write a signature on all of them!).  Sadly AIX was not much better (this was around the time of AIX 4.2/4.3).  These Operating Systems insisted on seeing each path as a different LUN…. which was a bit clueless.  It became rapidly apparent that two things were going on:

  1. Whatever SCSI standards existed to ensure uniform behaviour between hardware and  software vendors, were not being embraced.
  2. Vendor unique multi-pathing solutions to manage these paths became routine practice.

For IBM this meant creating a piece of software called Data Path Optimiser or DPO.   IBM toyed with the idea of charging for it, but rapidly realised that doing so made no sense, so they renamed it Subsystem Device Driver (SDD) and made it available free of charge.   Other vendors came out with their own versions for their own hardware (think EMC PowerPath or Hitachi HDLM) while Veritas brought out a multi-vendor capable package called DMP (which made much more sense, but cost money and so did not have the success it deserved).

So the real requirement for the market was two fold:

  1. Operating system vendors needed to embrace multi-pathing as a native feature of the their products.
  2. Hardware vendors needed to embrace SCSI standard compliant ways of indicating how multiple paths should be presented and used by those operating systems.

Fortunately in both cases, some common sense began to emerge from the fog.     Operating system vendors added native MPIO capability.   Microsoft started getting serious in Windows 2003 (with MPIO) and much more so in Windows 2008.  IBM started with a fix level in AIX 5.2 (which added MPIO), SUN kicked in with MPxIO.   Linux added DMP, which was a great step as it saved IBM from having  to recompile it’s closed-source SDD package every time a new Linux kernel came out.

From the hardware side SCSI3 standards came up with ALUA (Asynchronous Logical Unit Access).  In simple terms ALUA allows a strorage device to indicate to an operating system which paths are preferred, on both a port by port basis and a volume by volume basis.   This is really important for storage products that are active/passive, either for a whole controller or on a volume by volume basis (e.g. indicating that Volume 1 should ideally only be accessed using ports on Controller A while Volume 2 should ideally only be accessed using ports on Controller B).

So the story gets better as time goes on.   Hardware vendors for the most part have got on board with ALUA but there are some hold-outs.   This is why I was really pleased to see that the DS3500 and DCS3700 from IBM will now support ALUA (after a firmware update to version 07.83 or later, which should be available June 15, 2012).  The announcement letter is here.  This is a great step forward.  In case you’re wondering, IBMs DS8000,  XIV, Storwize V7000 and SVC all support ALUA.

But sadly while this improvement is a good positive step forward for IBM, there are still some simple problems in the industry that need to be fixed.  First and foremost:  Vendors need to stop producing their own multi-pathing software and either stick to just plugins to Operating System software (such as DSMs for Windows or PCMs for AIX, maybe with some handy utilities to list path status) or preferably work with native MPIO “out of the box”.  This means for instance switching from SDD to SDDDSM (Windows) or SDD to SDDPCM (AIX).  Ideally even these plugins should become redundant.

If a vendors hardware is dependant on heavy host path software being installed, then they should change their hardware (which is not so simple with legacy designs).   Having to install non-plugin Vendor supplied MPIO software creates potential interoperability issues that can prevent clients from making purchases from multiple vendors.   It blocks efficiency, it makes migrations harder and it creates uncertainty.   Insisting that you will only support a client if they install your multi-pathing software, but then refusing to support the installation of another vendors software, is equally unhelpful.

If I give a gold star out to any operating system vendor, it’s VMware.   Their ability to attach to storage from multiple vendors is simply fantastic.   Migration with VMotion is a breeze and I have happily moved clients between all sorts of storage platforms all attached to the same ESXi Server.   Nice!   Though I remain confused that EMC describe PowerPath/VE as superior to the native capabilities of VMware.  If this is more than just because EMC kit needs special handling, then VMware customers everywhere should be up in arms that the parent company is withholding technology from their hypervisor (unless of course they expect VMware customers to be only using EMC kit!).

So supporting ALUA is a critical step towards a more open world where multipathing just works ‘out-of-the-box’.

Want to read more about ALUA and MPIO?  Check out these very useful articles:

Microsoft MPIO

A great blog post on ALUA

Wikipedia on Multipath I/O

AIX Multi-path I/O

AIX SDDPCM vs AIXPCM

Advertisements

About Anthony Vandewerdt

I am an IT Professional who lives and works in Melbourne Australia. This blog is totally my own work. It does not represent the views of any corporation. Constructive and useful comments are very very welcome.
This entry was posted in advice, DS8800, IBM, IBM Storage, IBM XIV, Storwize V7000, SVC, vmware, Wikipedia and tagged , , , , , , , . Bookmark the permalink.

18 Responses to Why ALUA is a very cool acronym

  1. Pingback: Why ALUA is a very cool acronym « Storage CH Blog

  2. MrOdysseus says:

    Great post. Thank you Anthony!

  3. Hi Anthony,

    Great article, it’s great to see someone fully appreciating ALUA and not dismissing it’s worth.

    Regarding PowerPath/VE, it does some pretty cool stuff over and above what VMWare Native Multi-Pathing (NMP) can do.

    There is nothing preventing an EMC array from behaving like any other array in a VMWare environment, it’s just makes it even better (In my opinion).

    Before VSphere 4, ESX 3 only had basic NMP, failover only; if you had 4 HBAs in a host only 1 HBA would be active and the remainding 3 would be busy twidling their lil’ HBA thumbs until HBA 1 or anything alonge the path from then on failed. Therefore it was all unbalanced, it was either Fixed Path or Most Recently Used. (MRU).

    Incomes esx 3.5 and introduces Round Robin, which did make all the possible paths active, but isn’t really intelligent, it will send some data down one path and some down another and so forth.

    However, Round Robin doesn’t know anything about what else is happening such as HBA queue depth, IO Size, changes to the network or activity or even the Storage Array, and even less about what other traffic is out there (such as sending data down an already exhausted path just because it’s that paths turn.)

    FP, MRU and RR also don’t auto-restore previously failed paths which has been recussitated.

    With Vsphere 4.1 came Fixed path with Array Preference (VMW_PSP_FIXED_AP) which would query the array for it’s preference but when no user-preferred path is specified, VMW_PSP_FIXED_AP will select the preferred path according to the Array Path Preference & Current Path State, but doesn’t consider what the HBA or other bits are doing. 5 doesn’t seem to added much here.

    This is where PP/VE (< Childish I know, but I love calling it PP ;) ) has it's place in this world; PowerPath/VE provides:
    * Dynamic load balancing – not just RR or Failover, it knows how busy a path is, #IO/QD/IO Sizes etc.
    * Auto restore of paths
    * Auto perfomance optimisation – will discover the array type (Symm, Clariion or supported third party) and set it to Symm Optimised (director), Clariion ALUA or Adaptive for the 3rd parties.
    * Dynamic path failover – when a path fails, it redistributes the workload to the surviving paths in accordance with their workload (RR won't care what a path is doing.) and redistribute when the path is restored.
    * Maintains Stats and alerts to issues.
    * Regular testing of paths so that it will know of a path issue before anything else does.

    Additionally, because with a VNX, if a controller is unaccessable, even if it's because some muppet pulled out off the fibers from that controller for a minute, the VNX in combo with PP can use the internal Non-Prefered Path over the PCIe CMI back plane to go back into the original controller and provide data without incurring a controller/lun fail-over.

    There is also a performance gain to be made by using PP/VE, because it knows whats busy and whats not and reduces wait times by choosing the least busy path, irrispecitive of HBA, switch or array.

    So I hope that helps to see it's benefits.

    Oh, and thanks for the reminder on AIX – I can still determine bus, slot, hba, hba port, domain, switch port…. etc. etc from a hardware path (in HEX), HP-UX was just as bad/good with hardware paths as AIX, but I don't miss those days much, except when I get asked to do it again just this once. (hardware paths had their pro's and cons)

    Best regards,

    Aus Storage Guy.

  4. Thanks for the great comment, as usual you give some great info. It still sounds to me that EMC have some technology that coud benefit every VMware customer, both those who use EMC technology and those who do not. Stuff like auto-path restore and path testing would seem fairly generic, esp for truly SCSI compliant technology.

  5. graeme says:

    Thanks for the informative post. We just got a notification from our vendor regarding the new firmware for our DS3500s. I’m a generalist sysadmin who has to dabble with storage from time to time, but I really don’t have enough time to fully understand the finer details as thoroughly as I’d like, so I often find that a lot of the multipath-related stuff seems to go over my head, but you have managed to explain quite well why ALUA is a good thing.

    Now I just have to have a go at getting Ubuntu to do ALUA, unfortunately there are still some loads that we haven’t entrusted to VMware just yet. :-)

  6. Dag Kvello says:

    Any information on what SATP should be used against the DS3500 with the latest Firmware ?

    VMW_SATP_LSI VMW_PSP_RR Supports LSI and other arrays compatible with the SIS 6.10 in non-AVT mode

    or

    VMW_SATP_ALUA VMW_PSP_MRU Supports non-specific arrays that use the ALUA protocol

    or

    something else ?

  7. The DS3500 has not been updated yet on the VMware HCL…. Need to wait for some more info to come through…

  8. Dag Kvello says:

    I noticed (well hidden) a KB at VMware.
    http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2016753

    Default SATP Plug-in is changed for ALUA supported LSI Arrays in ESXi 5.0 Update 1
    Details
    On ESXi 5.0 hosts, the default Storage Array Type Plug-in (SATP) for LSI arrays was VMW_SATP_LSI, which did not support Asymmetric Logical Unit Access (ALUA) functionality.

    Starting with ESXi 5.0 Update 1, SATP for LSI arrays that supports ALUA is changed to VMW_SATP_ALUA, so that TPGS/ALUA arrays are automatically claimed by the default VMW_SATP_ALUA SATP.
    Solution
    The following storage arrays are claimed by VMW_SATP_ALUA:

    Vendor

    Model

    Description
    LSI

    INF-01-00

    IBM

    ^1814*

    DS4000
    IBM

    ^1818*

    DS5100/DS5300
    IBM

    ^1746*

    IBM DS3512/DS3524
    DELL

    MD32xx

    Dell MD3200
    DELL

    MD32xxi

    Dell MD3200i
    DELL

    MD36xxi

    Dell MD3600i
    DELL

    MD36xxf

    Dell MD3600f
    SUN

    LCSM100_F

    SUN

    LCSM100_I

    SUN

    LCSM100_S

    SUN

    STK6580_6780

    Sun StorageTek 6580/6780
    SUN

    SUN_6180

    Sun Storage 6180
    SGI

    IS500

    SGI InfiniteStorage 4000/4100
    SGI

    IS600

    SGI InfiniteStorage 4600

    For information on managing claim rules, see Managing Claim Rules in the vSphere 5 Command Line Documentation.

  9. Phil D. says:

    “With Vsphere 4.1 came Fixed path with Array Preference (VMW_PSP_FIXED_AP) which would query the array for it’s preference…”

    … unless the array is fronted by SVC. Unless something has changed recently, preferred paths set in the SAN Volume Controller are ignored.

  10. Phil D. says:

    Correction on my post: VMWare 4.1 will notice that it’s SVC and use (VMW_SATP_SVC). Still useless. For preferred pathing, you have to manually configure. Otherwise it uses the first discovered working path on boot.

  11. Phil D. says:

    My storage admin has set up preferred pathing on the SVC. Is it not better to configure preferred paths than to “spin the wheel” that is round robin?

  12. Phil D. says:

    I’d be willing to bet that the machine can balance the traffic better than our storage admin can predict it. I’ll set RR up in TestDev. Thanks for your valuable input..

  13. Pingback: Fix Lsi.inf Errors - Windows XP, Vista, 7 & 8

  14. Greg Curry says:

    Well, I know this is an old post, however I just spent several hours trying to figure out why the performance of a Dell MD3200i was less than expected, and this is one of the articles that came up when searching for “Dell MD3200i” and either VMW_SATP_ALUA or VMW_PSP_RR. So I figured this was the best place to try and share when I have learned about the MD3200i just now.

    So we built this simple VMware cluster (ESXi 5.5) with two hosts, and a single MD3200i with 2 controllers. The hosts have 8 x 1 GbE network ports. The SAN has 4 x 1 GbE per controller. So we direct wired, in a redundant fashion, the hosts to the SAN with no switch in the way. Two LUNs were setup, one preferred controler 0 the other controller 1. Test VMs were built on each host, and then we began running various disk benchmarking tools to test iSCSI SAN performance.

    At first, event after properly setting the hosts to use VMW_SATP_ALUA and VMW_PSP_RR for each LUN, iSCSI traffic would only flow over two links from each host at a time, the two links that were connected to the preferred controller for each LUN. Yes fail-over did work, in that if we disconnected the active links to the preferred controller, it would start using the two links on the non-preferred controller. We expected to be able to use all four iSCSI paths at the same time, to achieve upwards of 500 MB/sec (minus overhead), and only have reduced performance in a fail-over scenario.

    After some digging we found that the cause is the VMW_SATP_ALUA driver obeying the preferred path info it receives from the MD3200i, and that there is a way to tell VMware to ignore this and use all four paths! So, to help others that might be working on a similar setup, here are the esxcli commands we used to tweak performance:

    esxcli storage nmp satp rule add -s VMW_SATP_ALUA -V DELL -M MD32xxi -c tpgs_on -P VMW_PSP_RR

    esxcli storage nmp psp roundrobin deviceconfig set -d {device_id} –iops 3 –type iops –useano 1

    That last part “–useano 1” is what did the trick. The iops based policy is based on performance tuning we have done with EqualLogic SANs and comes from their tuning guide. Along with 9000 byte MTUs, disable DelayedACK, disable LRO (large receive offload). We are now literally getting twice the performance from this setup, and the IO from this SAN is much more acceptable now.

    So, I hope this helps and is of some value to others out there working with the MD3200i series! As I was not able to find this answer in any forum posts and came to this on my own, after reading through every single setting available in ESXi 5.5 for the VMW_SATP_ALUA and VMW_PSP_RR options.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s