Unreliable Data Backup

It seems kindof ironic that tape backup is the most popular form of data backup among businesses and it is also the most unreliable. It is not even close, so I will say it again; tapes are absolutely the most unreliable backup media. It's not necessarily the tape's fault. The tape drives, tape libraries, and software may be just as guilty as the media itself. In defense of all of the tape backup technology; it is complex stuff and it must all work together to produce good backups. Humans also contribute to the problems with tape backup; tapes are finicky and must be treated with care. Most backup tape media have environmental requirements for storage and even have a prescribed way they can be stacked or shelved. The procedures to maintain and ensure proper backup function must be followed and monitored diligently or the tape backup system breaks down.

Not only are tape backup systems unreliable, but they are also some of the most expensive to purchase and own. Modern high capacity tape media costs well over $100 per unit. The cost per GB is usually higher for tape media than the cost per GB of a hard disk drive. The tape hardware is much more costly to purchase and maintain than disk-based equipment. And the human resources required to operate and maintain tape based backup systems are also a major factor.

Okay, if tape backup is the most unreliable backup media, and also the most expensive, then why is it so widely used? I think there is one simple explanation; tape backup is the oldest form of backup and a lot of people haven't changed to disk-based or online backup systems yet. Many companies have invested heavily in their tape-based systems and can't swallow the thought of scrapping that investment. So they continue to pay the maintenance and upgrade costs year after year.

If you are still using a tape-backup system, you would be well served to look at disk based storage systems. Disk-based storage is not just onsite storage. Most modern disk-based systems have removable hard-drives that can be transported offsite just as easily has tape media. Better yet, just use your network to ship your data offsite. In most cases bandwidth and online/offsite backup storage is more cost effective than the cost of tape backup, and it is much more reliable and easier to use and maintain.

Another Reason to Backup Your RAID Storage

I think most of us feel pretty good about our data that is stored on RAID devices with fault tolerant configurations. Most of my servers are not only fault tolerant but also have hot spares configured. A failed drive will not result in lost data. With RAID 5 or above, it would take at least two failed drives to cause a data loss. I have drives fail occasionally and I replace them as they fail. I have never had two drives fail in the same array simultaneously.

When your disks are configured in a RAID array at fault tolerant levels, failed drives are no longer the biggest risk. A failed RAID controller can cause much more pain. Your drives may be perfectly healthy, but if your controller dies then your data is just as inaccessible as if your disk drives had failed. If you are thinking that replacing that RAID controller is going to solve the problem, then you may be in for a big disappointment. RAID controllers don't encode and store your data in a standard way. You can purchase the latest and greatest RAID controller and plug in your perfectly good drives from your previous array and still have no data.

RAID controllers are not obligated to recognize arrays, logical drives, and data that were created by another RAID controller. The best way to ensure that your data can be recovered is to replace the dead controller with the exact same controller model with the same firmware and revision. This is not always possible because manufacturers update their hardware frequently.

As usual I will end this post by encouraging you to backup your data and keep backup copies offsite. Fault tolerant RAID vastly reduces your chances of losing data due to hard drive failure, but there are other events that can cause your data to be lost. If you value your data enough to store it on fault tolerant systems, then the data is probably valuable enough to backup and store offsite.

Raid 0 and Stripe Sets are Risky

RAID 0 or Stripe sets allow two or more physical disk drives to be logically combined so that they appear as single large drive to the operating system. When disk drives are configured as stripe sets or RAID 0, each byte of data is distributed across the available disks. This results in enhanced read performance. The speed in which a single physical disk drive can retrieve and deliver data to the computer is limited by mechanical factors such as how fast the disk is spinning and how fast the heads can move (seek time). When the data is striped across multiple physical disk drives, the logic in the controller can request that each drive return parts of the data simultaneously. The controller can then assemble the parts and deliver the data to the requester faster than it could have been retrieved from a single drive.

RAID 0 is attractive because of it's performance advantages and also because there is no disk space overhead as there is in higher RAID levels like RAID1 and RAID5. However, RAID0 offers no fault tolerance as found in RAID1 and RAID5. RAID1 and RAID5 require extra physical disk drives to store data redundantly so that if one drive fails the data remains available.

Configuration guides and manuals make it clear that RAID0 does not offer fault tolerance, but they don't make it clear that your data is significantly MORE VULNERABLE on a RAID0 or stripe set configuration than a single drive. The reason for this increased vulnerability is because if any one of the drives in the array fail, all of the data on all of the drives will be lost. Let's say a particular disk drive has a 1 in 10 probability of failing in the next 12 months, if your dependent on two of those drives, then your risk of data loss increases to 2 in 10. If your data were dependent on three of those drives, then your risk of data loss increases to 3 in 10. So the more drives you have in your stripe set or RAID0 configuration the higher the probability that your data will be lost. Once again, because each byte of data is striped across all of the drives, the loss will encompass all of the data on all of the drives.

There are some good reasons to configure RAID0 and simple stripe sets, but never for data that cannot be easily and quickly recreated. Temporary files generated and used by applications, databases and operating systems are usually good choices for RAID0 and simple stripe set configurations. Any data that needs to be retained or would be costly to regenerate should not be stored on a RAID0 or stripe set configuration. Although, fault tolerant RAID levels significantly reduces the chances of losing data due to hard disk failure, it is still prudent backup important data and store it offsite.