There are two fundamental characteristics that every user expects from their Enterprise Storage:
- It should be highly available.
- It should never lose your data.
Neither of these things have anything to do with performance, or with snapshot features, or with price. In fact competitive comparisons usually presume that every vendor has got those two aspects right. They are presumed to be present in every product (and they need to be!).
You could argue that they are two ways to talk about the same thing, but I would argue that availability and data integrity need to be thought of separately. Let me explain:
One aspect of protecting data integrity versus high availability that is often misunderstood, is the difference between what looks like an internal UPS and what is in reality a cache battery. The IBM SVC and the IBM XIV are great examples of this. Both include off the shelf UPSs, and while these UPSs help with availability (by allowing these products to ride out brown outs or momentary building power loss), that is not their primary job.
Their job is in fact to guarantee data integrity. This is because despite the vast amount of money spent on building power, there is no way that an Enterprise storage vendor can presume that any buildings power can be trusted implicitly. There is always the risk that power will fail, for all sorts of reasons, most of them human. I have seen UPS protected data centers grind to a halt because the building manager forgot to check the diesel tank levels. I was in a major banking data center which went dark and stayed dark for many hours when a house electrician did something phenomenally stupid while switching between building power feeds. These things happen. You can engineer your building all you like, Enterprise Storage vendors need to always follow the X-Files rule: Trust No One.
But what does power loss have to do with data integrity? The answer is that practically every piece of Enterprise Storage out there uses DRAM to hold cache data; data that the client thinks has been written to disk (hardened in engineer speak) but is in fact only present in server memory (normally in a least two places for redundancy). If the power goes away, that data needs to be stored somewhere safe or it will be lost… and we don’t want to lose data (especially data the client thinks is actually on disk!).
So we protect that data using built-in battery units. Sometimes all you see is the battery, sometimes you see what looks like a UPS. Sometimes the necessary equipment is totally hidden from you, but I assure you it is there. In every case that battery protection is there to protect your data, to ensure it is never lost. And while those battery backup units may make your storage more available, that is not its first job in life.
What about other products?
- The Storwize V7000 Control Enclosure has batteries that slot into each of the power supplies. You cannot see them from the outside but here is a nice picture of where they hide:
- The DS8000 has multiple redundant batteries located at the bottom left of the rack (from the front). In this InfoCenter image they are labelled as item 4.
- According to the EMC VNX data sheet found here, the VNX contains: “Battery backup to allow for an orderly shutdown and cache de-staging to vault disks in the event of a power failure”.
- According the EMC VMAX data sheet found here, the VMAX contains batteries to guarantee data integrity. The term data integrity is used several times in this respect.
- The HP 3PAR has battery backup units described here whose function is to the let the controllers “save cached write data in the event of a power failure”. You can see them as item 3 in this HP diagram of a P10000:
One mistake that some vendors made in earlier designs was to use a battery backup unit to keep the data in memory active by effectively providing a second source of power to the cache. In other words, rather than write cache data out to internal disk when the power went away, the data stayed alive in cache, powered by battery. The IBM DS6800 used this method, which meant that once the batteries were fully discharged (which took about 3 days), the contents of cache were lost. You didn’t want your data center to experience unplanned power loss and then remain without power for more than three days. Of course by that point you had probably gone to your DR plan and that cached data was no longer needed, but it is a design aspect that is no longer favoured for this reason.
The good news is that the majority of vendors products that used this method, have now been redesigned. In most cases a flash memory device is now used to store the cache data perpetually until power is restored. The batteries just need enough charge to get that cache data destaged. This means a data center can be without power for weeks or months and the cache data will be safely protected.
My final comment is that I have seen data center managers get rather upset when they perceive their UPS infrastructure is being undermined by unnecessary batteries (or even worse, what look like UPSs) being added to their environment. In each case it is important to understand that what they are seeing is a fundamental internal component, designed to protect data integrity. It cannot be changed and it cannot be left out. After all, regardless of what event occurs in the data center, to slightly misquote the US Marine Corps: No Data should ever get left behind.