A dual vendor SAN policy?

A couple of years ago I went to a Juniper marketing event in Sydney.   It was a very well run event with lots of good sessions.  They talked about JUNOS, virtual chassis, East-West versus North-South network traffic, their ASICs… all good stuff.  But the most interesting session was one presented by a client.   I always like hearing clients speak at vendor events, provided they bring real world backstory to the table.   He spoke about how his company (a bank if I recall correctly) brought Juniper switches into their Cisco dominated networking department.   His message was:  “You can have a dual vendor networking policy.   The sky won’t fall in.”   I recalled this after reading Erwins blog post about Cisco vs Brocade and it got me thinking….  could we do the same in Fibre Channel SANs?

Now it’s a well understood requirement:  to have a highly available SAN you need dual independent fabrics.      Even the smallest customer will normally buy two fibre channel switches and cable them up independently.  The only thing that the two fabrics normally have in common is that hosts normally attach to both fabrics and each fabric normally contains often identically configured switches from the same vendor.

It’s common sense.   If you have dual fabrics then you get genuine benefits:

  • If one switch or fabric fails, traffic routes through the other fabric.
  • Human error (like a bad zoning update) can be limited to one fabric.
  • Maintenance and upgrades can be done on-line even if they are disruptive to the switch or fabric because traffic can flow through the other fabric.

Examples of fabric names include:  Fabric A and Fabric B; Fabric 1 and Fabric 2; the odd fabric and the even fabric; the red and the blue fabric.   But how about taking it to a whole new level and having a B and a C fabric.  Why B and C?   Because one fabric would be exclusively from Brocade the other fabric exclusively from Cisco.

Two suppliers?  Could this be good for my company?

Well I could describe at least three separate incidents that I have personally been involved in where firmware errors have occurred in both fabrics at the same time, bringing the customer crashing to their knees.   Each one would have been avoided if each fabric had been supplied by a different vendor.   In some cases having different firmware levels or different uptimes in each fabric would have also prevented the common event, but this is not always the case.  Admittedly the root cause in each case also relates to bugs long since fixed in newer microcode, but the events remain seared into my (and possibly the clients) brain.

So why don’t people do this?

Well for starters:

  1. Your staff would need skills in two vendors.  Sadly some employers hesitate to train their staff on one vendor, let alone two.  Of course it will give your staff a wider scope of equipment to work with (which admittedly might make them more employable) and SAN concepts remain the same regardless of vendor.
  2. The switch sizes and speeds between vendors are not always equal.   For example Cisco don’t sell an 80 port switch but Brocade does.  Right now Cisco don’t have 16 Gbps FC switches.
  3. The embedded switches in your blade center chassis would also need to follow the dual vendor policy (presuming this is possible).
  4. You might hit inter-operability issues where each fabric has conflicting requirements (such as minimum or maximum firmware on FC HBAs), but I doubt this would be a common issue.
  5. You may pay more.   Maybe.   But… you might also create competitive pressure and pay less.
  6. Physically swapping switches between Fabrics would be harder to do.   This is true, but how often does this really happen?

To me the main advantage of doing this would be that you are taking your desire for availability to a whole new level.   Independent fabrics from different vendors would truly eliminate any risk of common code bugs and make human error harder simply because you could not easily replicate procedural error when working on both fabrics at the same time.

It would be interesting to see a Request for Proposal (RFP) that specifies dual vendors for SAN equipment (though I have never seen this).

Of course, maybe I am just nuts for suggesting this?
How many people are doing this?
Are the results as I described?

Advertisements

About Anthony Vandewerdt

I am an IT Professional who lives and works in Melbourne Australia. This blog is totally my own work. It does not represent the views of any corporation. Constructive and useful comments are very very welcome.
This entry was posted in Brocade, Cisco, IBM Storage, SAN and tagged , , , , , , , , , . Bookmark the permalink.

6 Responses to A dual vendor SAN policy?

  1. Chuck says:

    I’ve maintained hosts attached to independent different-vendor fabrics for a ~5-6 month period while we migrated from one vendor to the other, i.e. moved equipment from vendor X fabric 1 to vendor Y fabric 1 and shut down the old fabric, repeat with the second vendor X fabric.

    I could see this being a problem if you worry about compatibility matrices. Finding a combination of host OS patches, HBA firmware, fabric switch code levels, array firmware, etc that are all compatible could be a nightmare if a critical patch for one of those vendors comes out and triggers a need to upgrade all the other components. Finding a common level to begin from during the migration above was tedious, I can’t imagine what it would be like on an ongoing operational basis. I have a feeling it’s something most teams wouldn’t stay in front of and would just get bitten in the ass down the road when they found an incompatibility caused by a component upgrade.

    Leveraging services like FCIP, iSCSI gateway functionality, tape acceleration, fabric-level encryption becomes a nightmare I think. It could also be tricky in an FCoE scenario since you have to factor in interop with the Ethernet switches at that point, too. Should you use different vendor CNAs in the hosts too? Brocade->Brocade + QLogic/Emulex->Cisco? Does the host multipathing support that?

    • Hi Chuck.

      Thanks for the well considered response. I totally agree that the moment we go beyond a simple SAN design then things become much more complex.
      Good to hear a real world experience of somehow who has done it though #:-)

  2. Hi Anthony!
    Interesting approach and a good mind game to get some fresh ideas. I think it could address some of the described problems. But from a troubleshooting point of view I agree with Chuck, because if a problem grows complex (think about performance problems on hosts with bad RAS packages just pointing “to the storage”) it will be very hard to keep both vendors in the boat and avoid any fingerpointing – let alone the PD/PSI itself. Of course SAN switches should all do the same (transport frames from A to B) but the devil is in the details. All in all and without making a comprehensive study to compare the both approaches, I would recommend my customers to avoid such a strategy.
    Cheers seb

  3. Owen says:

    I notice your hypothetical dual vendor model consisted of a pair of fabrics, one from each vendor. I would suggest that as well as all the complexities you cite, it would also do little to add competitive pressure as you would be compelled to purchase from both vendors to keep the environment balanced.

    An alternative approach would be to have two pairs of fabrics. Say 1&2 is Cisco, another in Brocade. I have seen similar environments co-exist during migrations for extended periods. It allows competition, you have the option of putting your servers and storage on either fabric without support matrix hell. If you have arrays with sufficient ports you can make them available from all fabrics, meaning your choice of switch provider is independent from your choice of array provider.

    If you add dual array vendors into the mix, you get a truly competitive environment. Of course, you need a certain scale to achieve this.

    Owen.

    • Thanks for your reply. I have also seen clients with 4 fabrics, again usually moving from say McData to Brocade. I agree scale here certainly helps (which I think was very true with that bank that split between Juniper and Cisco – they were big enough to need lots and lots of kit).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s