My colleague Micah Waldman, recently pointed out this blog post from Veeam: How to Get Unbelievable Deduplication Results with Windows Server 2012 and Veeam Backup & Replication! | Veeam Software …
This post leads you to a YouTube video that you can find here:
In this post, Veeam appear to get around 33 to 1 data reduction when doing backups, by reducing a backup of 240 GB of VMs down to 7.2 GB using a combination of Veeam and Windows dedup. I was initially impressed with the result, but then noticed some curious things.
Firstly they started with 4 VMs, each with a 60GB volume (so a total of 240GB, something that gets mentioned several times). These are the provisioned sizes but since they used thin provisioning, the actual used space is a total of 50.3GB. Here’s a screenshot of these numbers (the VMs are ausrv06 to 09). I apologize for the truly awful quality – I genuinely tried to get a better image. So the curious thing here is that rather than refer to the actual ‘thin’ data size, the video keeps referring to the ‘full fat’ size.
They then protected these 4 VMs using Veeam. Veeam has volume/job level deduplication, which reduced the total storage for these VMs from 50.3 GB (not 240 GB since these are thin provisioned VMs) to 34.1 GB (15.1 GB plus 17 GB). Here are screenshots of the backup job results (there are two jobs each with 2 VMs):
So Microsoft have come out with a new deduplication feature in Windows Server 2012 which in the example shown, Veeam are using as a post backup reduction tool. They define a 100GB Windows Server 2012 volume and enable deduplication on it. This is deduplication of files on the volume that is done as post-processing for files that are older than X days. For applications it looks like a normal file system but underneath Windows Server stores files more efficiently, thereby allowing you to store more capacity than the physical volume size.
Using this new deduplicated volume as storage for a Veeam backup repository, Windows Server 2012 was able to store the 34.1GB of backup data in 7.2GB of disk space, a 5:1 reduction.
So what did we have here?
- 240 GB of allocated VM storage. This in reality is actually:
- 50.3 GB of used VM data by VMware thin provisioning. This gets reduced to:
- 34 GB of backup data by Veeam volume-based dedup. This gets reduced to:
- 7 GB of storage space by Windows server 2012 dedup.
Overall, a 7:1 real dedup (from 50GB to 7GB), which is not too bad. But most of it came from Windows Server 2012. In reality out of the 233GB saved (starting from 240GB down to 7GB), only less than 7% came from Veeam (16GB) while 81% came from VMware thin provisioning and close to 12% came from Microsoft! The fact that Windows dedup was so effective even after Veeam’s dedup just goes to show how ineffective job-based 256KB block dedup really is (though I understand you can tune this by selecting the ‘right’ VMs to group together – which sounds like admin overhead).
One other technical side note: Veeam had to disable their compression and only use their dedup so as not to nullify the effect of Windows Server dedup, meaning they lose one space-saving technology to take advantage of another one. I believe there is some tuning that can be done there too.