Restore Capacity of a Network Backup System

A reader of this blog who is setting up his own online backup system posted a comment about handling multiple large restores simultaneously. In the event of a major disaster, such as Katrina or 9/11, there may be many computers in multiple locations needing to restore data. If the data is backed up to an offsite backup facility over a network or Internet, how do you plan for restore capacity?

The simultaneous restore capacity is mostly a function of the bandwidth availability at the backup site. The offsite data backup facility should be located in a data-center with significant bandwidth capacity. Most commercial online data backup service providers have their equipment located in high-bandwidth co-location facilities. The end-user locations where the data is being restored will typically have less bandwidth than the offsite backup facility. A business office or temporary location may have a T1 connection or possibly DSL or cable modem connection, which will have far less capacity than the offsite backup location.

For example; if the offsite backup location has 100mb/s of bandwidth capacity and the end-user locations have 3 Mb/s of bandwidth each. Then we can roughly estimate that it would take 33 end-users to saturate the bandwidth at the data-center. If all 33 end-users were actively transferring data, then each would be able to transfer 22.5 MB per minute. More than 33 end-users could simultaneously restore data, but the transfer rates would be less than 22.5 MB per minute. This example does not take into consideration transmission overhead and network congestion, which would certainly lower the performance of each end-user's restores, which would make more room for more simultaneous users.

The number and speed of the simultaneous transfers is probably not the most important question to be answered. There are other ways to look at this that might make more sense. After a significant disaster, if multiple users needed to restore data, the fact is that a data center with 100 Mb/s of bandwidth capacity could restore about a terra-byte in 24 hours. The terra-byte in day could be served to one user or 500 users. So if the backup server contained a terra-byte of data that needed to be restored, it would take approximately 24 hours for all of the users to restore all of their data.

The disk storage systems at the data center are rarely the limiting factor in backing up or restoring large amounts of data over a wide area network. A raid 5 array attached to modern servers can easily keep pace with the amount of data being transmitted over a wide area network. It is possible, however, that the server resources such as cpu and memory could be over utilized. Secure Socket Layer (SSL) could significantly raise the CPU requirements needed to process large amounts of data. If the CPU or memory of the servers are over utilized, then data will be served at a slower pace, reducing the total mount of data that can be restore in a given period of time.

When planning an offsite data storage facility that will be used over a network, available bandwidth is the primary resource to plan for. Disk space capacity can be determined by analyzing the amount of data to be backed up and retention requirements. It should be noted, that redundant disk configurations are essential to a good disk based backup systems. The actual configuration of the disk storage system can influence performance. The servers and memory requirements will be a function of the server software, but should not be a limiting factor with modern hardware.

No comments: