ProtecTier and Domino: How to get your de-dup back

The IBM ProtecTIER  performs in-line de-duplication of your backup data, enabling much faster backups and much faster restore times.   De-duplicating your backups allows you to store a lot “more on the floor”.

One of the advertised capabilities of ProtecTIER is that you can get a de-duplication ratio of up to 25 to 1.  This sounds great, but advertising this sort of ratio is a blessing and a curse.  On the one hand it shows the potential capability of the device, but it can also create very high expectations.  In reality the ratio you will achieve is totally dependent on the type of data you back up (video versus database versus big empty files, etc) and way you back it up (full backups versus incremental backups). In my experience, somewhere between 8:1 and 16:1 is a realistic expectation.  The reason for this is that your backup data needs to actually contain duplicate data, that is… data the ProtecTIER has already stored in its repository, for de-duplication to work.   If every piece of data you backup is unique, encrypted or somehow obfuscated to appear different to the last backup, then no duplicate data will be detected.  The result?  Your de-dup ratio will  be very low.

Backing up Lotus Domino databases is a good case in point.  When backing up your Lotus Notes databases you may only see a 2.5 :1 dedup ratio, which is clearly not a good result.  The issue may well be with a function called compaction. Compaction re-arranges all of the data contained in the NSF (Notes Storage Format) databases to reclaim space.   While this function helps to reduce space utilization from the perspective of Lotus Domino, it also changes the layout and data pattern of every single NSF.  So the next time ProtecTIER receives blocks from these databases, they all look unique, so the de-dup ratio naturally ends up being very low.  However running compaction is a best practice for Lotus Domino, so disabling it is not a solution.

The solution involves using a tool called DAOS (Domino Attachment and Object Service), which removes all the email attachments from the NSF files and stores them separately. This not only provides substantial space savings for Domino (because it only stores each unique attachment once) but also means that compaction can still run on the NSF files (which are now attachment free).   The result at one customer?   The combined de-dup ratio went up to 8.5:1 (which was about 2.5:1 on the NSF but almost 20:1 on the attachment files).

The only caveat is that Lotus Domino needs to be at version 8.5 to use DAOS.   More information on performing backups with DAOS can be found here.

Thanks to Francois Morin for sharing this with me.

Advertisements

About Anthony Vandewerdt

I am an IT Professional who lives and works in Melbourne Australia. This blog is totally my own work. It does not represent the views of any corporation. Constructive and useful comments are very very welcome.
This entry was posted in IBM Storage and tagged , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s