Do a quick audit: how many USB keys have you got. Five? Ten? Twenty? Do you know what files are on each one? What if instead you had 50,000 USB keys? Sound like madness?
Now what if instead of USB keys, these were unmanaged LTO tapes? Nightmare time.
So what leads me to these crazy thoughts?
Well despite the protestations of the tape-is-dead brigade, the simple reality is that tape still has a great role to play in long-term data retention. The advantages of tape are obvious:
- It’s cheaper than disk per TB over 5 years (TCO is everything here, although with de-duplication and compression, tape needs to keep leapfrogging to keep that advantage)
- It’s greener than disk (a tape in your hand consumes no power)
- It’s easier to offsite than disk (far less mechanical parts to damage)
- It scales very simply (just order more tapes)
With the introduction of Linear Tape File System (or LTFS) we also have the ability to access tape without any backup software. This to me is a great step forward, as it allows the possibility of reading older tapes without any risk of needing the software that compacted and organised the contents of those tapes.
However I met a client recently who got me thinking about this whole long-term storage thing. This customer is a collector and user of long-term scientific data. This customer has 10,000 tape cartridges on-site at their primary data center and 40,000 carts offsite. This customer will happily buy 5000 tape cartridges a year. This customers data will never expire. Ever. Nothing gets thrown away.
So can LTFS help these guys? Quite possibly. But two constants are true regardless of what path they go down:
Firstly, when you have scientific data spread over 50,000 (and growing) cartridges you need to have magnificent metadata management. You need metadata that describes your scientific data in such a way that scientists can request information with the greatest accuracy and the least movement of unwanted data (and tapes). Does LTFS fix this? In itself: no. You still need a very clever data management architecture. But LTFS could be a great basis to build that, enabling a truly open architecture for meta data management.
The other problem is far more interesting: In 20 years time will they still be able to recall a tape written today? The answer? Provided they don’t buy the cheapest possible media, I believe so! But there is a challenge: Backwards read compatibility. From the LTO FAQ:
- An Ultrium drive can read data from a cartridge in its own generation and two prior generations.
- An Ultrium drive can write data to a cartridge in its own generation and to a cartridge from the immediate prior generation in the prior generation format.
Backwards write compatibility is not an issue for these guys. But read compatibility is. To read a tape in LTO2 format they need to keep LTO4 drives. An LTO3 tape needs at least an LTO5 drive and so on…
So the trick here is to keep older technology to read older carts. Right now most vendors are still supplying at least LTO3 drives, so it is possible to buy new hardware today, that will read any LTO cart that you have ever written. I cannot imagine this situation will last forever though. There will come a point when the option to buy a new LTO3 tape drive will go away (it started shipping in 2005), followed by LTO4 (which started shipping in 2008), etc. Remember that LTO6 is going to be available fairly soon.
The story for this client takes an interesting twist. The amount of data they need to store is increasing dramatically as practically every scientific device that is instrumented to collect data, is now able to collect more data per sample than ever before. This data is critical to our surviving on this planet so moving to higher density technology is vital. Retaining old drives means giving precious tape library space to old technology. So going to LTO5 across the board is the plan, but this means retiring LTO2. Which in turn means they need to recall every LTO2 cart ever written so they can write the data from those carts onto an LTO5 cart. Sounds like a challenge, but it is proving 2 things:
- The client is finding they can stack the contents of 10 LTO2 carts onto one LTO5 cart, saving them long term storage space.
- They have so far not seen media quality issues when reading these old tapes. So tape is proving itself as a long term storage medium.
So what did I learn from this?
- LTFS is not a solution without a way to manage metadata. No one wants to open their desk drawer and find 50,000 unlabelled USB thumb drives. So lets not just talk about just hardware when we discuss LTFS, we still need a clever software solution.
- If you have LTO1 and LTO2 media, it is time to expire that data or move it onto new technology. The LTO roadmap looks very cool, but it won’t help you read that old media.
- Don’t believe the FUD. Tape is a reliable long term storage media.