IBM Releases several Data Integrity Alerts for Storwize products

IBM recently released three major and significant alert for Storwize products (V3500, V3700, V5000 and V7000).

I am reproducing the text from the emails I received.   I tell you this because if IBM update the Website text, my blog post may not get updated.

1691 Error on Arrays When Using Multiple FlashCopies of The Same Source

http://www.ibm.com/support/docview.wss?uid=ssg1S1005288&myns=s028&mynp=OCSTHGUJ&mynp=OCSTLM5A&mynp=OCSTLM6B&mynp=OCST5Q4U&mynp=OCST3FR7&mynp=OCHW206&mync=E&cm_sp=s028-_-OCSTHGUJ-OCSTLM5A-OCSTLM6B-OCST5Q4U-OCST3FR7-OCHW206-_-E

ABSTRACT: There is an issue in the RAID software that calculates parity for systems that have multiple FlashCopies of the same source. This issue will cause the parity to be calculated incorrectly and may lead to the system logging a 1691 error and may eventually lead to an undetected data loss.

Affects: Storwize devices on 7.3 and 7.4 versions
Resolution: This issue is resolved in 7.4.0.5 and 7.5.0.0

Note that 7.5.0. is not the latest version – do not install that version!
At time of writing 7.5.0.2 is available. If you are on 7.3 or 7.4 then stick with 7.4.0.5

Note also that the IBM link above says that the issue affects only V7000s, but this is because there are separate alerts and pages for each Storwize model.
If you are using Storwize products of any kind with FlashCopy you are affected.  If you are not using FlashCopy, read on!

Data Integrity Issue when Using Encrypted Arrays

http://www.ibm.com/support/docview.wss?uid=ssg1S1005296&myns=s028&mynp=OCST3FR7&mynp=OCHW206&mync=E&cm_sp=s028-_-OCST3FR7-OCHW206-_-E

ABSTRACT: IBM has identified an issue which can cause data to be written to the wrong location on the drive when using encrypted arrays on Storwize V7000 Gen2 systems. This will often result in systems logging 1691 and 1322 errors, and undetected data loss.
Affects: V7000s on 7.4 and 7.5 versions
Resolution: This issue is resolved by APAR HU00820 in releases 7.4.0.5 and 7.5.0.2.

This really does affect only V7000s a other models don’t offer this software encryption feature.   If you are not using Encryption, read on!

Data Integrity Issue when Drive Detects Unreadable Data

http://www.ibm.com/support/docview.wss?uid=ssg1S1005289&myns=s028&mynp=OCSTHGUJ&mynp=OCSTLM5A&mynp=OCSTLM6B&mynp=OCST5Q4U&mynp=OCST3FR7&mynp=OCHW206&mync=E&cm_sp=s028-_-OCSTHGUJ-OCSTLM5A-OCSTLM6B-OCST5Q4U-OCST3FR7-OCHW206-_-E

ABSTRACT: IBM has identified specific hard disk drive models supported by the Storwize family of products that may be exposed to possible undetected data corruption during a specific drive error recovery sequence. The corrupted data can eventually trigger the system to log a 1691 error. A firmware update that remediates against future occurrences of this issue is now available. IBM recommends that all customers with the affected drives apply these latest levels of code.

Note also that the IBM link above says that the issue affects only V7000s, but this is because there are separate alerts and pages for each Storwize model.
If you are using Storwize products of any kind with the listed Seagate disks then you are affected.

Now the website lists capacities…. but again you might be fooled.
The capacity shown here are decimal but the Storwize GUI and CLI are always adhere to binary honesty (which I like).  So don’t be fooled by the idea you are told by the GUI you have 3.6 TB drives and they are not listed in the table below…. They are 4 TB drives according to the label.

Product_id   Capacity   Minimum Firmware level containing fix 
ST300MM0006    300 GB   B56S
ST600MM0006    600 GB   B56S
ST900MM0006    900 GB   B56S
ST1200MM0007   1.2 TB   B57D
ST2000NM0023     2 TB   BC5G
ST3000NM0023     3 TB   BC5G
ST4000NM0023     4 TB   BC5G
ST6000NM0014     6 TB   BC75

Also in the GUI, I found the firmware version of my drives was not shown by default, I had to add it as per the screen capture below.   Here is a quiz question…  does the screen capture show a potentially affected machine?

2015-07-26_16-38-29

If you answered YES you would be correct!

To be sure we can run the software upgrade tool, or dump the script below into a CLI window (paste the whole thing!):

svcinfo lsdrive -nohdr -delim , | while IFS="," read -ra drives; do svcinfo lsdrive -delim , ${drives[0]} | { while IFS="," read desc data ; do [[ $desc == "id" ]] && id=$data; [[ $desc == "product_id" ]] && product_id=$data; [[ $desc == "firmware_level" ]] && firmware_level=$data; done; printf "%5s%10s%10s \n" "$id " "$product_id" "$firmware_level"; }; done

The output will look like this (I showed the paste so you see what your entire PuTTY session would look like).    Again, is this an affected machine?

2015-07-27_11-36-20

Yes it is affected, as BC5C is below BC5G (G being later than C in the alphabet!).

Once you know you are affected, you can follow the upgrade instructions in the IBM Alert. It is much easier to do this on 7.4 as you can upgrade your drives from the GUI instead of using the CLI.

 

 

 

Advertisements

About Anthony Vandewerdt

I am an IT Professional who lives and works in Melbourne Australia. This blog is totally my own work. It does not represent the views of any corporation. Constructive and useful comments are very very welcome.
This entry was posted in IBM Storage, Storwize V3700, Storwize V7000, Uncategorized. Bookmark the permalink.

6 Responses to IBM Releases several Data Integrity Alerts for Storwize products

  1. roger says:

    Please change this
    “Note that 7.5.0. is not the latest version – do not install that version!
    At time of writing 7.5.0.2 is available. If you are on 7.3 or 7.4 then stick with 7.4”
    To this
    “Note that 7.5.0.0. is not the latest version – do not install that version!
    At time of writing 7.5.0.2 is available. If you are on 7.3 or 7.4 then stick with 7.4.05”
    Ciao Roger

  2. Eugen Constantin says:

    Hello,
    I have version 7.4.0.3. If I’m not using Flash Copy, am I impacted by “1691 Error on Arrays When Using Multiple FlashCopies of The Same Source” ? I’ve got “1691” errors on one of my 7.4.0.3 V7000 storage but the issue seems to be a SSD drive with bad blocks. One of my database was corrupted :(. The SSD was replace and IBM in U.S. is investigating that SSD.

    • Hi Eugen. I would rely on IBM to help you, but it may be you have some other issue. Be good to know what it is… keep pressuring IBM to give you a good RCA

      • Eugen Constantin says:

        IBM is investigating but they are not sure if my problem is fixed or not. IBM support said that I have to wait 7 days after replacement of SSD drive. At that time the scrubbing process will be completed for the entire mdisk and I shouldn’t see any new inconsistency. My trust in IBM mid-range storage products is decreasing day by day.

      • The scrubbing bit makes sense. Until the MDisk is scrubbed we dont know if there are any other parity issues

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s