Why does my enterprise storage need batteries?

There are two fundamental characteristics that every user expects from their Enterprise Storage:

  1. It should be highly available.
  2. It should never lose your data.

Neither of these things have anything to do with performance, or with snapshot features, or with price.   In fact competitive comparisons usually presume that every vendor has got those two aspects  right.   They are presumed to be present in every product (and they need to be!).

You could argue that they are two ways to talk about the same thing, but I would argue that availability and data integrity need to be thought of separately.  Let me explain:

One aspect of protecting data integrity versus high availability that is often misunderstood, is the difference between what looks like an internal UPS and what is in reality a cache battery.   The IBM SVC and the IBM XIV are great examples of this.   Both include off the shelf UPSs, and while these UPSs help with availability (by allowing these products to ride out brown outs or momentary building power loss), that is not their primary job.

IBM SVC Nodes each with a Battery Backup Unit

Their job is in fact to guarantee data integrity.   This is because despite the vast amount of money spent on building power, there is no way that an Enterprise storage vendor can presume that any buildings power can be trusted implicitly.    There is always the risk that power will fail, for all sorts of reasons, most of them human.   I have seen UPS protected data centers grind to a halt because the building manager forgot to check the diesel tank levels.  I was in a major banking data center which went dark and stayed dark for many hours when a  house electrician did something phenomenally stupid while switching between building power feeds.   These things happen.  You can engineer your building all you like, Enterprise Storage vendors need to always  follow the X-Files rule:   Trust No One.

IBM XIV Gen3 UPS Units

But what does power loss have to do with data integrity?   The answer is that practically every piece of Enterprise Storage out there uses DRAM to hold cache data;  data that the client thinks has been written to disk (hardened in engineer speak) but is in fact only present in server memory (normally in a least two places for redundancy).   If the power goes away, that data needs to be stored somewhere safe or it will be lost…  and we don’t want to lose data (especially data the client thinks is actually on disk!).

So we protect that data using built-in battery units.  Sometimes all you see is the battery, sometimes you see what looks like a UPS.   Sometimes the necessary equipment is totally hidden from you, but I assure you it is there.  In every case that battery protection is there to protect your data, to ensure it is never lost.  And while those battery backup units may make your storage more available, that is not its first job in life.

What about other products?

  • The Storwize V7000 Control Enclosure has batteries that slot into each of the power supplies.  You cannot see them from the outside but here is a nice picture of where they hide:
    Storwize V7000 Battery Unit

    Storwize V7000 Battery Unit slotting into the power supply

  • The DS8000 has multiple redundant batteries located at the bottom left of the rack (from the front).  In this InfoCenter image they are labelled as item 4.

    DS8000 Battery Units are item 4

  • According to the EMC VNX data sheet found here, the VNX contains:  “Battery backup to allow for an orderly shutdown and cache de-staging to vault disks in the event of a power failure”.
  • According the EMC VMAX data sheet found here, the VMAX contains batteries to guarantee data integrity.  The term data integrity is used several times in this respect.
  • The HP 3PAR has battery backup units described here whose function is to the let the controllers “save cached write data in the event of a power failure”.   You can see them as item 3 in this HP diagram of a P10000:

    HP 3PAR P10000

One mistake that some vendors made in earlier designs was to use a battery backup unit to keep the data in memory active by effectively providing a second source of power to the cache.   In other words, rather than write cache data out to internal disk when the power went away, the data stayed alive in cache, powered by battery.   The IBM DS6800 used this method, which meant that once the batteries were fully discharged (which took about 3 days), the contents of cache were lost.  You didn’t want your data center to experience unplanned power loss and then remain without power for more than three days.  Of course by that point you had probably gone to your DR plan and that cached data was no longer needed, but it is a design aspect that is no longer favoured for this reason.

The good news is that the majority of vendors products that used this method, have now been redesigned.  In most cases a flash memory device is now used to store the cache data perpetually until power is restored.  The batteries just need enough charge to get that cache data destaged.  This means a data center can be without power for weeks or months and the cache data will be safely protected.

My final comment is that I have seen data center managers get rather upset when they perceive their UPS infrastructure is being undermined by unnecessary batteries (or even worse, what look like UPSs) being added to their environment.   In each case it is important to understand that what they are seeing is a fundamental internal component, designed to protect data integrity.  It cannot be changed and it cannot be left out.   After all, regardless of what event occurs in the data center, to slightly misquote the US Marine Corps:       No Data should ever get left behind. 


About Anthony Vandewerdt

I am an IT Professional who lives and works in Melbourne Australia. This blog is totally my own work. It does not represent the views of any corporation. Constructive and useful comments are very very welcome.
This entry was posted in Actifio, DS8800, IBM Storage, IBM XIV, Storwize V7000, SVC and tagged , , , , , , , , . Bookmark the permalink.

15 Responses to Why does my enterprise storage need batteries?

  1. Pingback: Why does my enterprise storage need batteries? « Storage CH Blog

  2. Adam says:

    One vendor that come to mind uses super capacitors over the years in different ways.
    Dothill has several patents around the technology http://investors.dothill.com/releasedetail.cfm?releaseid=456637 I think the HP P2000 uses the super capacitor to de-stage to compact flash.
    Which equals a maintenance free midrange system.

    I know LSI still use batteries for there RAID controllers, but I’ve noticed PMC/Adaptec has been shipping super capacitor BBWC controllers since 2009. Are super capacitor (maintenance free) designs simply not suitable to the enterprise arrays?

  3. Dean says:

    I may or may not have worked in a data centre that experienced a catastrophic complete power loss when a water pipe leaked onto the UPS, causing the whole site to basically short out. I was very thankful for the battery in both the IBM and HDS arrays at that time.

  4. Great blog. @Jeffoconnorau asked in twitter if Nutanix have batteries too. The answer is no. But allow me to elaborate.

    Imagine a 10,000 node Google File System cluster with each of them having local storage. Of course, Google can afford to put a UPS for each of them, but the point is they don’t have to because the concept of data protection and high-availability have been elevated to software realm from hardware. Not only that they don’t expect a server or other hardware to not fail, but they take it for granted that they WILL fail.

    This is why there is a saying that HW engineers will keep designing to make sure that something never fails, while software engineers design to ensure that things recover flawlessly WHEN they do fail. Google File system, AmazonDB, Microsoft Azure, all of them are designed primarily by SW engineers who know that depending on HW alone to provide data availability isn’t feasible anymore. Most of the storage systems you mentioned above were designed by very smart hardware engineers (They do have amazing software functions in them like flashcopy, mirroring etc. But the core of HA is done with a lot of dependency on hardware working flawlessly)

    Military stopped going bigger in size like MOAB and moved into more efficient bombs (Sort of like Cluster Bombs) for somewhat similar reasons. Designing ONE BIG thing isn’t really efficient or effective anymore. Same theme applies to Storage. No point in building a very large storage controller with custom hardware anymore. But you will have to, if you don’t have scale-out clustering and you have to deal with just 2 controllers. That’s where most of them go wrong. (XIV is different since they do have scale out)

    What we did at Nutanix is to bring Google FS like clustering technology where we take the data and consistently write to multiple PCI-Flashes and scale it to 100s of nodes, but do it without causing too much latency so that system can still be used for primary, low-latency applications. That is one of our major intellectual properties. This is why one will have to design the system from ground-up including the file system. This is why Nutanix is unique compared to others who claim to have converged appliances.

    The time for too-much-dependency on any single hardware component where we need to have a wing, a prayer and some batteries are over.

  5. nate says:

    One Q on those IBM SVC batteries – are they UPSs or are they external batteries? I have seen some facilities (co-lo at least) don’t allow customer UPSs in them, partially because of EPO needs(well the customer may be able to pay the big $$ to get an EPO hook up but probably is not worth it in most cases for a small UPS), and in some other cases because daisy chaining UPSs can be bad (some special electrical reasons why). Though they don’t seem to have problems with batteries in the systems, I guess because in some way internal batteries behave differently than external UPSs ? I used to have an Exanet NAS cluster which coincidentally used some IBM x86 servers to power it but they too relied on external (APC in their case) UPS instead of internal batteries. I was told recently since Dell acquired those assets, that the systems have been re-designed to have internal batteries.

    I’m sure some other high end folks do it too, but something I thought was kind of wacky/cool about the 3PAR batteries in their higher in boxes is that there are redundant batteries(for each controller) with staggered replacement cycles, which has some obvious advantages over having a single battery for each controller.

    That DS8000 rack is fascinating, it looks like it is extra wide, maybe an extra food wider than a more “normal” sized rack(along with seemingly an entirely custom interior rather than just a wider standard cabinet – I like big racks for servers so there’s plenty of space for cabling).

    • The SVC physically has Eaton 5115 Powerware UPSs.. but they are being used primarily as cache batteries.

      The colo question is a good one.
      You may get an other objection because the Co-lo manager is semi-aware of EN60950, a European safety standard which appears world wide in various standards. It basically states that if a computer room gets powered off, then any batteries must cut power in less than five minutes. The text looks like this:

      Extract from EN60950: 3.4.11
      Battery circuits: For computer room applications, batteries integral to equipment shall incorporate a means for battery disconnect and a means for connection to the remote emergency power off circuit that disconnects the battery power source, except for battery circuits for which (1) the product of the open circuit voltage times the rating of the overcurrent protective device does not exceed 750 VA or (2) any resistive load cannot draw more than 750 VA for more than five minutes after the mains power is disconnected. If connection to the remote emergency power off circuit is required, batteries shall be disconnected within five minutes of activating the remote emergency power off circuit.

      The good news is that the Eaton 5115 UPS is rated for less than 750 VA so this rule does not apply.

      As for the DS8000 rack… it is a 26inch rack… a bit wider than what we are used to.

      • nate says:

        Interesting, that is great info thanks. I had not heard of that rule before. I had no idea an “EPO action” could take as long as 5 minutes to complete!

      • Its a little known fact… Co-lo managers dont necessarily understand the rule.
        Note also I quote from the EU rules… other countries have their own variation.

  6. mahesh says:

    DS 8k can have how many ____ battery backup units

    • Varies based on model and which rack you are talking about.
      I am sure this is documented very clearly in one of the Redbooks… I would have to go and read them to check, but since you are asking the question, maybe you can take a look?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s