No one wants 50000 USB keys in their desk drawer

Do a quick audit:  how many USB keys have you got.  Five?  Ten?  Twenty?  Do you know what files are on each one?   What if instead you had 50,000 USB keys?  Sound like madness?

Now what if instead of USB keys, these were unmanaged LTO tapes?  Nightmare time.

So what leads me to these crazy thoughts?

Well despite the protestations of the tape-is-dead brigade, the simple reality is that tape still has a great role to play in long-term data retention.   The advantages of tape are obvious:

  • It’s cheaper than disk per TB over 5 years (TCO is everything here, although with de-duplication and compression, tape needs to keep leapfrogging to keep that advantage)
  • It’s greener than disk (a tape in your hand consumes no power)
  • It’s easier to offsite than disk (far less mechanical parts to damage)
  • It scales very simply (just order more tapes)

With the introduction of Linear Tape File System (or LTFS) we also have the ability to access tape without any backup software.  This to me is a great step forward, as it allows the possibility of reading older tapes without any risk of needing the software that compacted and organised the contents of those tapes.

However I met a client recently who got me thinking about this whole long-term storage thing.   This customer is a collector and user of long-term scientific data.  This customer has 10,000 tape cartridges on-site at their primary data center and 40,000 carts offsite.  This customer will happily buy 5000 tape cartridges a year.   This customers data will never expire.  Ever.  Nothing gets thrown away.

So can LTFS help these guys?   Quite possibly.   But two constants are true regardless of what path they go down:

Firstly, when you have scientific data spread over 50,000 (and growing) cartridges you need to have magnificent metadata management.   You need metadata that describes your scientific data in such a way that scientists can request information with the greatest accuracy and the least movement of unwanted data (and tapes).   Does LTFS fix this?  In itself: no.   You still need a very clever data management architecture. But LTFS could be a great basis to build that, enabling a truly open architecture for meta data management.

The other problem is far more interesting:  In 20 years time will they still be able to recall a tape written today?   The answer?   Provided they don’t buy the cheapest possible media, I believe so!    But there is a challenge:   Backwards read compatibility.  From the LTO FAQ:

  • An Ultrium drive can read data from a cartridge in its own generation and two prior generations.
  • An Ultrium drive can write data to a cartridge in its own generation and to a cartridge from the immediate prior generation in the prior generation format.

Backwards write compatibility is not an issue for these guys.  But read compatibility is.  To read a tape in LTO2 format they need to keep LTO4 drives.  An LTO3 tape needs at least an LTO5 drive and so on…

So the trick here is to keep older technology to read older carts.   Right now most vendors are still supplying at least LTO3 drives, so it is possible to buy new hardware today, that will read any LTO cart that you have ever written.  I cannot imagine this situation will last forever though.  There will come a point when the option to buy a new LTO3 tape drive will go away (it started shipping in 2005), followed by LTO4 (which started shipping in 2008), etc.  Remember that LTO6 is going to be available fairly soon.

The story for this client takes an interesting twist.  The amount of data they need to store is increasing dramatically as practically every scientific device that is instrumented to collect data, is now able to collect more data per sample than ever before.  This data is critical to our surviving on this planet so moving to higher density technology is vital.  Retaining old drives means giving precious tape library space to old technology.  So going to LTO5 across the board is the plan, but this means retiring LTO2.  Which in turn means they need to recall every LTO2 cart ever written so they can write the data from those carts onto an LTO5 cart.  Sounds like a challenge, but it is proving 2 things:

  1. The client is finding they can stack the contents of 10 LTO2 carts onto one LTO5 cart, saving them long term storage space.
  2. They have so far not seen media quality issues when reading these old tapes.  So tape is proving itself as a long term storage medium.

So what did I learn from this?

  1. LTFS is not a solution without a way to manage metadata.  No one wants to open their desk drawer and find 50,000 unlabelled USB thumb drives.  So lets not just talk  about just hardware when we discuss LTFS, we still need a clever software solution.
  2. If you have LTO1 and LTO2 media, it is time to expire that data or move it onto new technology.  The LTO roadmap looks very cool, but it won’t help you read that old media.
  3. Don’t believe the FUD.  Tape is a reliable long term storage media.

About Anthony Vandewerdt

I am an IT Professional who lives and works in Melbourne Australia. This blog is totally my own work. It does not represent the views of any corporation. Constructive and useful comments are very very welcome.
This entry was posted in advice and tagged , , , , , . Bookmark the permalink.

2 Responses to No one wants 50000 USB keys in their desk drawer

  1. Alex Sons says:

    IMHO, you forget to mention a few things.

    For one, transferring data on older generations of tape technology to newer reassures you have still all data (I expect that any serious data is stored at least twice, i.e. on separate tapes). So if a read error occurs, and with 50,000 tapes it is not that uncommon, chances of data loss are still very little.

    Another one, when having the data transfered a few times speed of recovery or speed of access to the data is greatly enhanced. You can read 800GB of data from LTO5 in a snap, you can grow a beard if reading from LTO1…you could make a business case upon this feature/fact alone…

    Getting warmed up, ever is a long time indeed. Somewhere in the next 20 years the tape media will deteriorate, of course, but the tape drive hardware is much more at risk of developing hardware failures. Not to mention hardware and/or software incompatibilities like HVD/LVD scsi cabling or PCI Express(!) SCSI adapters, software incompatibility (drivers!!!) to name just a few. Somewhere like in a 100 years you don’t want even to think about having to cope with 1000 LTO1 tapes.

    In the end, costs are really important. Nowadays, when you have to store a whopping 2TB of data, enterprise tape is a very costly solution. It is much cheaper, faster and more secure to have this amount stored on ten different RAID-1 (mirrored) sets of 3TB disks and shelve nine of the mirror sets. When having to cope with long term storage for 20TB of data there are still lots of deduped VTL solutions that might be more cost effective than tape. But when having to store 2PB of data enterprise tape arguably is the most cost effective solution. Period.

    So anyone, when backing up or archiving, when your data is somewhere between the terabytes and petabytes tape will be your choice of technology.

  2. Hi Alex.
    Great well thought out comment. Agree with everything you say… transfer speeds and availability of compatible attachment hardware are all key issues in long term retention.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s