IBM recently released a flash for DS3500 and DCS3700 entitled:
RETAIN tip H206954: Data corruption issue using XCOPY in VAAI
The contents of the flash are shown at the bottom of this post, so if you are a DS3500 user, check it out.
The issue got me thinking about how this scenario can play out:
- New DS3500s are now shipping with 7.83.xx firmware that is VAAI capable.
- It is not possible to downgrade these controllers to an older non-VAAI capable version (like 7.77 or 7.70).
- ESX 5.0 and 5.1 will attempt to use VAAI by default with any new volume that is presented by a storage controller. It does this by issuing an Atomic Test and Set command. If the command succeeds, Hardware Acceleration is set to Supported. I discussed this here.
- Any future storage vMotions, clones or snapshots against VMDKs on datastores hosted by those volumes will then be potentially hardware accelerated, meaning that VAAI capability will be used wherever possible.
To get some more background on how ESX decides when to use hardware acceleration, see Franks excellent post here that describes how hardware accelerated copy commands are used when it makes sense and is possible:
The point being you have no control over when VAAI XCOPY is used (unless you turn Hardware Accelerated Moves off), meaning if your new storage has a bug like the one described below and you don’t pay attention, you are in trouble.
This brings me to something that I hope VAAI version 2 will change, which is that VAAI controls for ESXi are universal, either it is on for every volume or it is off for every volume. So if you are using a DS3500 and an XIV at the same time, you will need to disable XCOPY universally, losing hardware acceleration on both products.
As mentioned above, here are the contents of the tip:
Data corruption was reported when using VMware ESXi 5.xx with VAAI hardware accelerated VMFS data movement enabled on IBM System Storage DS3500 and DCS3700 Storage Controllers.
The issue occurs when XCOPY is utilized to improve performance on the following VMware operations:
|– Storage vMotion
– VM cloning
– VM snapshots
Any of the affected models and code levels (shown below) running with VMware ESXi 5.xx hosts with VAAI enabled and performing Storage vMotion, VM cloning, or VMware snapshots are likely exposed to undetected data corruption during these operations.
The system may be any of the following IBM storage controllers:
|– IBM System Storage DCS3700 Storage Subsystem, type 1818, any model
– IBM System Storage DS3512, type 1746, any model
– IBM System Storage DS3524, type 1746, any model
This tip is not option specific.
The 7.83.xx.xx firmware for the DS3500 and DCS3700 is affected.
This behavior has been corrected in the 7.83.27.00 release and later of the System Storage controller firmware.
The file is or will be available by selecting the appropriate Product Group, type of System, Product name, Product machine type, and Operating system on IBM Support’s Fix Central web page, at the following URL: