In-file Delta Backups - Taking Incremental Backup to the Next Level

Incremental backup methods only copy files that have changed since they were last backed up. This saves substantial time and backup media because only a fraction of the data on any particular system is changed daily. Traditional incremental backup methods determine which files have changed by using one or more of the following methods:
  • Archive flag setting
  • Time stamp comparison
  • CRC comparison
  • File size comparison

Once it has been determined that a file has been changed since it was last backed-up, the entire file must be backed-up again. While this is much more efficient than backing up the entire disk, incremental backup still makes copies of vast amounts of data that have not changed. My Outlook .PST file, for example, is about 2GB and contains many thousands of saved emails. Furthermore, my Outlook .PST file changes everyday. Incremental backups would copy the entire file every day. I may only save 10 new emails in my .PST file on a typical day but the entire file must be backed up in order to have a current backup copy. So effectively, I would be backing up many thousands of emails so I can have a backup copy of the last 10.

Delta file backups methods improve on the traditional incremental backup methods by only backing up parts of files that have changed. In my case, only about 20K of my .PST file gets backed up on a daily basis with delta file technology. Delta technology examines a file in chunks called blocks. A CRC value is computed for each block of the disk file and then compared to the CRC values of the copy of the file on the backup system. Each block that differs is backed up. Blocks that have not changed are not backed up because an exact copy of those blocks is already stored on the backup system. If the file needs to be restored, then the blocks of the file on the backup system are used to reconstruct the file including all of the most recent updates.

When I first looked at delta backup methods, I had a certain amount of apprehension. I had a fear that the blocks would get out of sync or for some other reason the file would not be the same once it had been taken apart and put back together. I spent quite a bit of time looking at the algorithms, and testing actual backup and restores using delta technology. I was quite surprised at the results. Delta file technology is solid and mature. In fact, I hadn't seen a single case where file integrity violated. I have come to the conclusion that the algorithms and implementations are very good and leave little to chance. It is also apparent that the testing of the software that implements delta file technology is fairly straight forward, making the process verifiable. I liken the delta file technology to that of SSL or ZIP compression when it comes to the ability to transform files to another form and back again. It simply works, every-time.

In-file delta technology is especially useful in online backup systems. The better online backup systems are designed to use bandwidth efficiently by avoiding movement of data that is already at the destination. In my case, my .PST file is backed up daily to an offsite backup service in a few seconds. It would take at least 2 hours without the the delta file technology. Online backup to offsite storage locations is great technology, but it is the in-file delta technology that makes it practical to backup large files over the Internet.

1 comment:

Anonymous said...

For online backup news, information and articles, there is an excellent website:

This site lists more than 400 online backup companies and ranks the top 25 on a monthly basis.