Semmelweis could see the problem

Ignaz Semmelweis

I listened to another great podcast from the Freakonomics team recently in which they recounted the story of Doctor Ignaz Semmelweis, which inspired me to make a connection to something I see in my day to day job.

Doctor Semmelweis worked at the Vienna General Hospital in the 1840s, delivering babies, teaching students and performing autopsies.  Now while working there he realized there was something going horribly wrong at the hospital:  up to 1 in 6 of the women whose babies were delivered by the male doctors were dying either during or after childbirth.   This rate was far higher than the death rate for women whose babies were delivered by midwives and much higher even than the death rate for women who gave birth on the street!

Semmelweis studied this issue very closely and concluded (quite rightly)  that the issue was invisible cadaverous particles on the hands of the doctors.    The doctors were going straight from performing autopsies to delivering babies… and transmitting all sorts of foul material to the birthing mothers, killing some of them in the process.

His solution was simple:  He made the doctors wash their hands.

The result?   The rate of women dying after giving birth at that hospital went from a peak of 15% to less than 2%.

So you would like to think that this story ends with Semmelweis declared a hero and hospital hygiene achieving new heights.   Sadly it instead ends with Semmelweis being mostly ignored, going mad and dying from  injuries sustained from a beating he received in a mental asylum.   His discoveries only really began getting wider recognition after work by greats such as Louis Pasteur and Joseph Lister.

So what on earth does this have to do with Fibre Channel attached storage?

Well the answer is invisible dirt particles and their role in causing hard to explain issues (work with me here your honour, I will make my point).

Fibre optic cable relies on the exposed fibre being absolutely clean.  The center of the image below is the light coming from a light source being used with a fibre microscope.  While that lit spot looks large, it is actually only 62.5 microns (which is tiny).

62.5 micron

If you are using single mode (9 micron) fibre (commonly used with long wave adapters) that lit spot is even smaller:

9 micron

So what does a dirty fibre look like?   How about this:

Contaminated error generating cable

What about a badly cleaned one?

Badly cleaned cable

Now these images are scary. Even worse, the contamination is invisible to the naked eye.  It is almost impossible to see dirt on your fibres (and staring at the end of a cable is not recommended anyway, regardless of what is at the other end).  So this leads to some obvious questions:

How can I keep my cables from getting dirty?  

Quite simply don’t expose them to dirt.  Always leave dust covers in place on the cable ends and in the SFPs until they need to be used.   Don’t drag unprotected cables under the floor or leave them hanging in the racks.   Don’t re-use cables without cleaning them.   In fact I recommend cleaning new cables before you start using them.  Finally your dust covers need to be protected from dust too.   Store dust covers in a sealed bag so that if you re-use them, they have not become contaminated.

How can I clean my cables?

Cleaning kits are something every site should have onsite and always available (like hand sanitizer for Doctors!).  Google fibre optic cleaning kit for lots of products.   I have used Cletops devices but there are plenty of other choices on the market.

Can I create images like the ones above?

You sure can.  Google fibre microscopes for lots of products that can do the job for less than $500.  There are plenty of choices on the market.   Even if you are not willing to make the expense yourself, make sure your cable provider has one available.  If they are testing your cables with a flash light, get another provider.

Can my SAN switch tell me I have dirty cables?

The two most common commands I use are porterrshow and statsclear (on Brocade switches).   If you see any values in the highlighted six columns of evil, you may have bad SFPs, damaged cabling or dirty cables.    Just be careful it is not ancient history.   Clear the stats (with statsclear) and wait a decent interval before checking again with porterrshow.

I could talk in even more detail about monitoring at the switch, but I think that is a whole other blog post.

Feel free to share your horror stories.  Who knows, maybe dirty cables are causing your current horror story?

About Anthony Vandewerdt

I am an IT Professional who lives and works in Melbourne Australia. This blog is totally my own work. It does not represent the views of any corporation. Constructive and useful comments are very very welcome.
This entry was posted in Brocade, IBM Storage, SAN and tagged , , , , , , , , , , , . Bookmark the permalink.

6 Responses to Semmelweis could see the problem

  1. Seamus says:

    I spent around 30 days diagnosing an issue with one of our XIV’s (with the help of IBM of course) when we began seeing scsi reservations appear out of nowhere for all LUNs presented to our VMware farm. Every LUN would become scsi-reserved at completely random times, and all VM’s would crash as a result.
    Although we ran dual, physically separate fabrics with multiple ISL’s, the issue continued. In the end it was in fact the most obvious of all things, a dodgy ISL to one of our BladeCenters (most likely caused by dust). Unfortunately its not all that simple to replace fibre that spans two buildings across two floors.

    Solution; move the BladeCenter to the rack directly next to the XIV. Therefore replacing all ISL’s, pull the dodgy cable out and burn it with fire!

    • Wow… burned it with fire?
      Great story.
      In fact the truly ‘dirty’ image I showed was also from a ‘dodgy’ ISL that caused serious issues with cross site mirroring (not on XIV though).
      It was picked up with the fibre microscope (as were a great many other filthy cables at the same site)

  2. Jan Bartels says:

    Yes, this article is so right, and I know, the pictures are taken with the fiber inspection kit. Nevertheless, even people understand that cleaning the fiber ist important, cleaning without inspection is like a char, working in the dark.
    Furthermore with respect to the errorcounter – I still blame on most of the storage and HBA vendors, that accessing the crc-errorcounter and sfp-diagnostics is not easy possible, like it should e.g. per SNMP – which easily shows you, if dirt could be the reason, or not….

    Best Regards from 10 years of fibrechannel troubleshooting.

    • Hi Jan.
      I totally agree. Since we cannot confirm the results of our cleaning, it can be hit and miss. I have also seen some evidence that certain cleaning kits can leave contamination.
      I also agree that surfacing potential link related errors is not as well implemented as it could be.

      Thanks so much for your comments!

  3. Pingback: The importance of clean fibre optics | Storage & Beyond

  4. Pingback: The importance of clean fibre optics | Storage & Beyond

Leave a comment