VAAI issues with ESXi 5.0

For those of you planning to move to ESXi 5.0, IBM have found an annoying (but not show stopping) issue with the way XCOPY is implemented in the VAAI driver.  With ESX/EXi 4.1, IBM supplied the VAAI driver, but with ESXi 5.0 this changed and VMware now manage this themselves.  It has since emerged that the way VMware implemented XCOPY in this driver does not totally work with the way IBM implemented XCOPY in both the XIV and the Storwize V7000 and SVC.

This is the current situation with the first three VAAI primitives in ESXi 5.0:

Hardware accelerated locking:  Also known as Atomic Test and Set (ATS), this function works fine when ESXi 5.0 detects a volume from an  XIV, Storwize V7000 or SVC.  In fact the moment ESXi 5.0 detects a LUN from any of these products it uses ATS to confirm that VAAI is possible.  So this is goodness.

Hardware accelerated initialization:   Also known as write same, this function offloads almost all effort on the part of ESXi to write zeros across disks.  This function works fine when ESXi 5.0 works with  XIV, Storwize V7000 or SVC.   So this is also goodness.

Hardware Accelerated Move:   Also known as XCOPY, full copy or clone blocks, this function works fine with  XIV, Storwize V7000 and SVC if you clone a virtual machine and place the new copy into the same datastore as the source.  This means creating multiple clones of a VMDK inside the one datastore will still be accelerated by VAAI.   So far so good, but unfortunately on XIV, if you place the clone in a different datastore on the same XIV, it will not be hardware accelerated.  This means the clone is still created, but in the old-fashioned way (reading from the source and writing to the target).  As for storage vMotion,  it also reverts to working in the old-fashioned way, reading from the source and writing to the target, rather than the hardware accelerated way.

So to be clear, this issue with XCOPY:

  • Does not affect ESX/ESXi 4.1 at all.
  • Occurs no matter what version of VAAI compliant XIV, Storwize V7000 or SVC code level your running, or what model of XIV (A14 or 114).
  • Does not affect Atomic Test and Set or Write Same.
  • Does not affect clone operations on an SVC or Storwize V7000.
  • Does not prevent you using cloning on XIV,  it just means that this task will not be hardware accelerated if the target datastore is different from the source.
  • Does not prevent you using storage vMotion, it just means that this task will not be hardware accelerated.

How will this be fixed?     Well right now it looks like it will be fixed in new firmware on the IBM hardware.   Watch this space and I will update you as soon as I have more news to hand.

As for the fourth VAAI primitive, unmap, which is used for space reclamation on thin provisioning capable hardware, please also watch this space on when IBM hardware will support it… BUT in my opinion it does not matter right now, because this new unmap function in ESXi 5.0 can potentially cause issues.   This is described here:    http://kb.vmware.com/kb/2007427

So until VMware confirm a fix, I recommend that you run the following commands on all ESXi 5.0 boxes which connect to IBM Storage to disable unmap.   I tested the following syntax to confirm it works:

First confirm the value of the unmap setting (1 means enabled, 0 means disabled):

~ # esxcli system settings advanced list -o /VMFS3/EnableBlockDelete | grep "Int Value" 
   Int Value: 1
   Default Int Value: 1

Then disable it:

~ # esxcli system settings advanced set --int-value 0 --option /VMFS3/EnableBlockDelete

Then confirm it is disabled:

 ~ # esxcli system settings advanced list -o /VMFS3/EnableBlockDelete | grep "Int Value" 
   Int Value: 0
   Default Int Value: 1
Advertisements

About Anthony Vandewerdt

I am an IT Professional who lives and works in Melbourne Australia. This blog is totally my own work. It does not represent the views of any corporation. Constructive and useful comments are very very welcome.
This entry was posted in IBM Storage, IBM XIV, SAN, Storwize V7000, SVC, VAAI and tagged , , , , , , , . Bookmark the permalink.

21 Responses to VAAI issues with ESXi 5.0

  1. Pingback: VAAI issues with ESXi 5.0 « Storage CH Blog

  2. Duncan says:

    Not sure I understand this correctly…. But VAAI in vSphere 5 uses the T10 SCSI standard for XCOPY. This might not align with how IBM has implemented it or expected it, but not sure how VMware could fix that… I’ll see if I can dig up more details, but ti me it feels like IBM should fix this and not VMware. (my opinion, not VMware’s statement)

    • avandewerdt says:

      Hi Duncan.

      Thanks for your comment.
      It appears the behaviour of ESXi 5.0 changed between the beta program and the final release.
      So while everything worked fine when IBM tested all VAAI functions during the beta testing, this problem appeared after VMware made the final release of ESXi 5.0

      The final fix could be made in either place… either VMware release a fix or IBM release a fix.
      IBM are working with VMware to determine where the best place to make the fix will be.

  3. Misguided says:

    You are not making any sense. You are claiming this is an ESXi 5.0 issue, yet you are saying this will be “fixed” by IBM firmware. If the fix is in IBM firmware, how could it be an ESXi 5.0 issue? Any why haven’t other storage companies run into this? I miss the quality IBM storage systems had 10 years ago.

    • avandewerdt says:

      Hi Misguided. Interesting name.

      As I replied to Duncan, this problem did NOT occur during testing with the beta version of ESXi 5.0
      This issue is not the result of poor quality with IBM firmware or poor quality with IBM testing.

      Are you sure other companies have not struck this issue… or are they not being clear and open about it?

      • misguuded says:

        Well, I just saw VAAI demos at vmworld las Vegas at various vendor booths and they didn’t bring up such an issue when I asked them about ESXi 5 support. I asked them about 5 because we are trying to make a buying decision ourselves. In all fairness the demos were based on 4.1, so all I have is their word. But I did check with 2 non ibm vendors.

      • avandewerdt says:

        I am not surprised they did not know about it at Vegas, this is breaking news. As I said, 4.1 works perfectly, 5.0 works for nearly all use cases and the missing piece will be rectified as soon as the best solution is worked out between IBM and VMware.

  4. Michael says:

    VAAI was working for us with ESXi 5.0 and disabling unmap feature until we upgraded our XIV from 10.2.4a to 10.2.4b…
    Hope they solve it in 10.2.4c?

  5. Michael says:

    Don’t know why, but it worked at least for half a day with 10.2.4a while we were migrating datastores from VMFS3 to VMFS5 we did a couple of of SVMotions and they were accelearted…. until 10.2.4b…

    PMR-NR: 55018.113.848

    • Curious. It appears from side discussions that ESXi 5.0 has the ability to choose to not use specific VAAI functions if it feels they are not offering better performance. So it may be timing, it may be workload, but once ESXi decides that XCOPY with VAAI is not giving it the boost it wants… it will stop using it and switch to the old fashioned way.
      As i said… this issue is still under investigation and the final solution is not determined (as far as I know). When I learn more, I will let you know… but please also keep pushing via the PMR.

  6. Claudio says:

    Hello Anthony,
    Do you have any update on this?
    I have to start a new deplyment for a client and I’m not sure if i should go with esxi 4 or 5 with a V7000 6.3

    Thanks

    • Hi Claudio.

      I have tested VAAI with ESX 4 and 5 on Storwize V7000 using 6.3 code and all functions work successfully.
      The only outstanding issue is we are still waiting on the SRA that works with 6.3 code, but if you are not doing SRM, it is not an issue.

  7. Phil D. says:

    Hi Anthony:

    How up-to-date is this info? When I look at the VMware comaptibility guide, it would seem there are no current issues. Specifically, I’m planning to implement vSphere 5.1 with SVC Firmware 6.3. Will I end up with fully functional VAAI?

  8. Wilson C. says:

    I’ve been told from our IBM support that vSphere 5.1 is only officially support in the latest Storwize v7000 6.4 code. As I don’t really want to go to 6.4 branch yet, I still not sure if I should upgrade to 5.1 and experience any problem with VAAI and lose IBM support.

  9. Wilson C. says:

    Anyone is running vSphere 5.x with 6.4.x code now? Do you see any issue?

  10. MBH says:

    We have a client with ESXi 5.0 and V7000 firmware 6.4, and Storage vMotions were slow even since firmware 6.3. Everything is crawling. Deploying a 70 GB template takes more than 30 minutes, and storage migrating a 23 GB VM took over 20 minutes!

    This happens regardless of the load balancing algorithm used (Fixed, MRU or Round Robin).

    We’ll try disabling hardware acceleration and if that doesn’t help, then I have no clue what the issue could be stemming from. We’re opening a PMR anyway & will see where that goes.

  11. MBH says:

    Hello again,

    I’ve confirmed that the bug is still there. The problem seems to be from ESXi 5.0 not the V7000 itself, and slow Storage vMotion has been attributed to XCOPY handling.

    The option mentioned in this post (EnableBlockDelete) was already turned off, and that is its default state.

    The correct solution is go to each ESXi host’s advanced settings (In Configuration tab) -> DataMover -> set HardwareAcceleratedMove to zero (default value is 1).

    A VM that took 45 minutes to move, now takes 3 minutes only. To maximize performance, all volumes have their multipathing policy set to Round Robin since the V7000 supports it. This has to be done on all hosts.

    While doing the migration, before disabling that option, the performance seen on the V7000 charts was about 75 MB/s. After disabling it, it goes up to 500 MB/s, and you will also see a lot of traffic on the FC interfaces, because now the host has to keep sending commands to the storage unit, as the operation is no longer handled by the storage itself.

    In the command line, you can change the option with:
    Disable: esxcfg-advcfg -s 0 /DataMover/HardwareAcceleratedMove
    Enable: esxcfg-advcfg -s 1 /DataMover/HardwareAcceleratedMove

  12. We have recently begun a 4.1 to 5.0 upgrade, and in our environment, an issue has come up. We are seeing “warm restarts” of SVC nodes and IBM has stated this is due to VAAI in 5.0. We have been advised to disable VAAI completely on SVC connected hosts until we can roll in 6.4.1.4 on SVC. We are currently on 6.4.0.4.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s