If you dedup empty space, is it unbelievable?

My colleague Micah Waldman,  recently pointed out this blog post from Veeam: How to Get Unbelievable Deduplication Results with Windows Server 2012 and Veeam Backup & Replication! | Veeam Software …

This post leads you to a YouTube video that you can find here:

In this post, Veeam appear to get around 33 to 1 data reduction when doing backups, by reducing a backup of 240 GB of VMs down to 7.2 GB using a combination of Veeam and Windows dedup.   I was initially impressed with the result, but then noticed some curious things.

Firstly they started with 4 VMs, each with a 60GB volume (so a total of 240GB, something that gets mentioned several times).  These are the  provisioned sizes but since they used thin provisioning, the actual used space is a total of 50.3GB.  Here’s a screenshot of these numbers (the VMs are ausrv06 to 09).  I apologize for the truly awful quality – I genuinely tried to get a better image.   So the curious thing here is that rather than refer to the actual ‘thin’ data size, the video keeps referring to the ‘full fat’ size.

VM Size

They then protected these 4 VMs using Veeam. Veeam has volume/job level deduplication, which reduced the total storage for these VMs from 50.3 GB (not 240 GB since these are thin provisioned VMs) to 34.1 GB (15.1 GB plus 17 GB). Here are screenshots of the backup job results (there are two jobs each with 2 VMs):

BackupJob2 Job1

So Microsoft have come out with a new deduplication feature in Windows Server 2012 which in the example shown, Veeam are using as a post backup reduction tool.  They define a 100GB Windows Server 2012 volume and enable deduplication on it. This is deduplication of files on the volume that is done as post-processing for files that are older than X days. For applications it looks like a normal file system but underneath Windows Server stores files more efficiently, thereby allowing you to store more capacity than the physical volume size.

Using this new deduplicated volume as storage for a Veeam backup repository, Windows Server 2012 was able to store the 34.1GB of backup data in 7.2GB of disk space, a 5:1 reduction.

So what did we have here?

  • 240 GB of allocated VM storage.  This in reality is actually:
  • 50.3 GB of used VM data by VMware thin provisioning. This gets reduced to:
  • 34 GB of backup data by Veeam volume-based dedup. This gets reduced to:
  • 7 GB of storage space by Windows server 2012 dedup.

Overall, a 7:1 real dedup (from 50GB to 7GB), which is not too bad. But most of it came from Windows Server 2012.   In reality out of the 233GB saved (starting from 240GB down to 7GB), only less than 7% came from Veeam (16GB) while 81% came from VMware thin provisioning and close to 12% came from Microsoft!    The fact that Windows dedup was so effective even after Veeam’s dedup just goes to show how ineffective job-based 256KB block dedup really is (though I understand you can tune this by selecting the ‘right’ VMs to group together – which sounds like admin overhead).

One other technical side note: Veeam had to disable their compression and only use their dedup so as not to nullify the effect of Windows Server dedup, meaning they lose one space-saving technology to take advantage of another one.  I believe there is some tuning that can be done there too.


About Anthony Vandewerdt

I am an IT Professional who lives and works in Melbourne Australia. This blog is totally my own work. It does not represent the views of any corporation. Constructive and useful comments are very very welcome.
This entry was posted in Uncategorized and tagged , , , , , . Bookmark the permalink.

14 Responses to If you dedup empty space, is it unbelievable?

  1. Pingback: If you dedup empty space, is it unbelievable? | I Love My Storage

  2. Paul says:

    “you can tune this by selecting the ‘right’ VMs to group together – which sounds like admin overhead”
    Not creating your VMs from the same template sounds like way bigger “admin overhead” really ;)

  3. I think the question is, on all of these technologies, what runs out of CPU? And when it does, how catastrophic will that be?

  4. Disclaimer: I represent Veeam

    Thin provisioning or not wouldn’t really matter. Where you do have a point is that this is indeed not 250 GB of actual DATA. The 50GB is the actual data within the VM’s. However … if these were thick provisioned disks, which I can’t see in the video, these would take 250GB of storage if you moved them untouched. And if these would be thin provisioned in the first place, they would inflate to it’s full size when offloading them from VMFS. Although this might seem misleading from a first point of view it does not have to be wrong.

    “Veeam had to disable their compression and only use their dedup so as not to nullify the effect of Windows Server dedup, meaning they lose one space-saving technology to take advantage of another one” – this is a common mistake. Wherever you are doing deduplication, this always takes resources (CPU/MEM). So if you have any dedupe appliance as a target it’s in the best interest of the user to disable/soften the dedupe within the backup process. This will decrease the backup window. Enabling dedupe to it’s fullest within the Veeam job would also go a lot further but then the Windows Server would not have some much room in deduping anymore. There is just only a certain amount of squeezing you can do.

    The entire blogpost is meant to point at this really neat feature in the new Server 2012 that just adds more flexibility in choices of designing the Veeam distributed architecture.

    • I don’t disagree that the combination works. Clearly compressing before you give it to Windows would make Windows job doing a dedup almost impossible so deciding where to put the work in terms of timing and effort makes perfect sense. Post process dedup also requires more space (especially as the WIndows 2012 dedup runs after a day). My main concern was that the video does talk about 240 GB of VMware data being reduced and I just didn’t see that. I present on the benefits of dedup/compression on an almost daily basis so being exact about the from and to numbers is very important to me.

  5. sivaram says:

    With more complex setup, this becomes more difficult to manage. To Align and save backup storage, VM Admin , server admin & backup admin put together need to work and make this happen. Simply a headache and more time in work for every one !
    Why this all done by single solution ?

  6. Pingback: Online backups and other such things: | David Emeron: Sonnet Blog

  7. Paul Sustman says:

    So basically, Microsoft’s primary storage dedup doesn’t really add much value. Remember post process disk compression that used to be in NTFS? How well did that work out? I wouldn’t want to trust my critical data on this disk compression feature replacement.

    I guess that’s why backup admins use dedup technology built into enterprise backup products to handle large volumes and different types of application data. Why anyone would use post process dedupe technology on a file system designed to hold your Office documents to store Enterprise backup data is beyond me.

  8. Sean says:

    One of my favorite, “hey wait a minute upon closer inspection I disagree with your spin” blog posts was this one. http://www.mrvray.com/2011/12/no-point-in-locking-the-door-when-walls-have-fallen/

  9. Pingback: Online backups and other such things: | David Emeron: Sonnet Blog

  10. Matt says:

    This is an older article, but I have to agree with the author – in that a lot of the stats I see from that are misleading… The [storage related] stats are misleading to the point I bailed out on a Veeam eval after just 1 week. Their deduplication capabilities are horrible, and they consider skipping unused space in a virtual disk to be “compression”. I reached out to our account exec for some references, and all I hear is crickets.

    I was anxious to replace Backup Exec, but after Veeam’s colossal failure I’ll keep shopping.

  11. Pingback: Online backups and other such things: | David Emeron: Sonnets

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s