Guidelines for disaster-recovery planning

A solid data-protection strategy involves various backup schemes, media rotation, media management, and choosing the right equipment.

By Michael Chamberlain

When organizations develop procedures to restore inoperable computer systems to quickly resume computer-based operations in the event of a data/system loss, they are engaged in disaster- recovery planning (DRP). Growth in enterprise computing has elevated DRP to a critical process to prevent loss of revenue following an event causing data/system loss.

Creating a disaster-recovery plan for your organization can be an intricate and involved process. It begins with developing a solid data-protection strategy. An organization's continuous improvement practices should include reviewing this strategy to ensure it reflects the most current needs of the organization. This guide is an overview of steps to assist you in assembling your organization's data-protection strategy.

Business's growing reliance on enterprise networks and the mass storage that accompanies them has empowered organizations and put them at risk. The explosion of data and information stored electronically will continue to grow exponentially. According to the International Data Corporation, for hard disks alone, 5,988,000TB will be created in 2001, calculating out to an annual growth of 110%. Any number of environmental disasters such as floods, hurricanes, fires, power failures, or tornadoes can have a devastating impact on locations worldwide.

Shaded squares indicate previous backups. White squares indicate the most recent backups. Weekly backups are completed on Fridays.
Click here to enlarge image

Outside of environmental disasters are more-prevalent, real-time issues like equipment failures, computer hackers, malicious viruses, and human error. These threats are time- and asset-consuming. The cost of such a disaster ranges from thousands of dollars to millions.

To determine whether your organization should move forward with a disaster-recovery plan, ask yourself the following:

  • How much downtime can the organization afford? Loss of revenue, increased expenses, and customer loss are just a few of the factors affected by even a short business shutdown.
  • Are you obligated to have a disaster-recovery plan? Contractual business relationships often involve a clause holding suppliers responsible for delivering products regardless of hardships.
  • Does the organization's survival depend on a fast data recovery? Any organization without critical application data faces a sizable dilemma-one that may even be large enough to be the difference between everyday business and no business at all.
  • Is the organization a public company, bank, utility, or government agency? In most cases, the law requires a disaster-recovery plan.

Developing a data-protection strategy

Developing, testing, and refining your data-protection procedures are the most important elements in the success of any data-protection strategy. Organizations with successful data-protection procedures test and refine them over time. This process should be scheduled at regular intervals deemed appropriate by the organization to ensure consistent refining of the data-protection strategy. The reliability of restore procedures can be greatly enhanced by establishing a strict rotation plan with many sets of tapes. Multiple copies and storing a copy off-site are almost a necessity for critical data.

The backup process can be controlled in two ways. Distributed backups have a tendency to rely on multiple people and various devices. With the amount of people involved and the variety of equipment, its success rate in times of a real crisis is extremely low. The changing network environment and properly managed LANs have brought about a centralized backup process. In a centralized backup, the amount of control is much greater. Data-protection duties usually fall on one individual or one team for an entire organization. Furthermore, all physical backups are controlled from one central computer with a smaller number of devices. A centralized backup procedure brings with it the ability to automate many of the day-to-day backup tasks. This is in addition to a central location for reporting to ensure both servers and network-attached devices are backed up.

Steps toward a solid data-protection strategy

The steps involved in the initial data-protection strategy plan are key to avoiding data loss and being able to execute recovery should data loss occur. A basic data-protection strategy should begin by answering the following questions:

  • How much data can the organization afford to lose?
  • How much downtime can the organization afford to lose during the restore process?
  • How often should your backups be scheduled?
  • What is the time frame the backups will be done?
  • How long will your retention period be?
  • What is the safest media for your schedule and retention period?
  • How will the success of each backup be verified?
  • Does your equipment allow for not only timely backups but timely restores?
  • Is off-site storage a must for your data and if so, where will it be stored?
  • What people will be responsible for managing the data-protection strategy?
  • In the event of a disaster, what trained persons will be in charge of carrying out the backup/restore procedures? How will these individuals be contacted at any time of day?

These questions need to be answered and re-evaluated at each scheduled review to meet the specific data needs of your organization. A data-protection strategy must be reviewed regularly to ensure that if a data loss occurs, a solid platform exists from which to rebuild.

Various backup types and levels

The method and level of your backups is another large component of a data-protection strategy. Three backup levels-full, incremental, and differential-are used in conjunction with various media rotation models. Individual file or complete disk imaging can be used with all three backup levels.

Full backups usually include all of your essential data. Weekly, monthly, and quarterly backups should constitute a full backup. The first weekly backup should be a full backup, usually made on Friday or over the weekend, in which all the files you want included are copied. In between each full backup, your Monday through Thursday backups can either be incremental or differential to save on time and media. It is recommended that full backups be made at least weekly.

Incremental backups contain only the data that has changed since the previous full or incremental backup. Incremental backups on average take less time because there are fewer files backed up. Restoring data backed up incrementally may take longer because the data is needed from both the last full backup and each incremental backup after that full backup.

Differential or extended incremental backups contain every file that has changed since the last full backup. Restoring a differential backup is faster because only the full and the latest differential must be restored. Differential backup is increasing in overall usage because it traps files at points in time (e.g., before virus corruption or hardware failure). If backup window constraints prevent weekly full backups, daily backups should be differential to prevent the necessity of using several tapes for a restore.

Disk image backups contain a snapshot of your disk sent to tape as an entire volume. The process is nearly seamless, allowing the tape drive to stream at maximum performance. Image backups provide a fast, entire-system restore. Many image backups also allow individual file restoration.

Media rotation models

The media rotation models recommended are those that possess multiple media sets and a depth of file versions to allow restoration of a file at a point in time. Grandfather-father-son (GFS) and Tower of Hanoi are two solid rotation schedules providing a long and varied history of file versions. Both provide extensive recovery options. Organizations using a tape-a-day process are overwriting the last backup and destroying multiple versions of files. This is an inadequate method for proper recovery.

Grandfather-father-son is a three-tiered approach. "Son" is the incremental or differential daily backup, "father" is the full weekly backup, and "grandfather" is the monthly full backup. A total of 12 media sets are required for this basic rotation scheme (four daily, Monday through Thursday; five weekly, Friday week one through five; and three monthly, month one through three). In this scenario the tape would not be reused until the next month, week, day matching its label. It is not reused because even though the backup window will have expired, the write window will not have begun.

The Tower of Hanoi model is based on recursion techniques from an ancient Chinese game. The general idea of the game is the movement of disks from one peg to the next where a smaller peg can only be placed on a larger one. Similar to the game, many multiple media sets are rotated through in incremental and full backups. It uses more media sets than grandfather-father-son for increased safety.

Media management

Since media is reused, a media-retirement plan and multiple copies are a necessity. Each of the media sets in the media rotation models discussed may be multiple tapes, depending upon the amount of data backed up. Since tapes are being reused, they should have a retirement process.

Plans include a time schedule, when tape errors exceed reasonable limits, or when they have been used a certain number of times. Archive tapes should be periodically pulled from the rotation scheme and retired to build a longer history for quarterly, yearly, and other important information such as engineering plans or financial data.

Multiple copies are vital to recovery efforts. It is a good idea to have two backup copies of the same data. In the event of a problem with one of the backups or pieces of media, another exists for additional protection. Using three copies of pertinent data is also a viable and widely used option. The third copy can be archived off-site for disaster-recovery purposes. If the backup media goes bad, your data is lost forever. An old copy is better than no copy.

Equipment selection process

Selecting the proper hardware to perform crucial daily backup tasks is one of the most important decisions in a data-protection strategy. Performance is one of the most important aspects of this decision. The overall performance of a data-protection system is calculated by how well the backup software, physical tape drives, the media, and LAN/WAN data stream work together. Selecting cost-effective, high-transfer-rate performance tape drives with high-capacity media cartridges can be a very difficult decision.

Tape drive comparisons can be challenging because of the magnitude of equipment available. Comparisons should be made on the basis of performance ranges of both native and compressed options. The tape drive should have built-in hardware compression to help the overall performance of your backup and reduce media consumption. Stable and high-performance equipment such as a good data-protection strategy increases overall reliability when it comes time to perform restores.

Average hourly cost of downtime
E-commerce/financial industries$6.4 million
Point-of-sale backbones$2.6 million
Mail order industry$90K
Package shipping/transportation industry$28K
Mainframe-based organizations$75K
Small to mid-sized business LAN$18K
Average hourly cost to re-create data$50K
Source: Contingency Planning Association Research, Strategic Research

Another consideration when looking at equipment is your need for automation and unattended operation. Automation increases storage capacity, growth path, and the length of time backups can proceed unattended. This reduces human intervention, helps solve shrinking backup window problems, and improves data security and application availability. Automation also allows easy management of media rotation and seamless system recovery. One key thing to remember in choosing equipment is to leave plenty of room for growth.

The other key aspects in your decision should center on the equipment's reliability and maturity in the market. Devices on the leading edge of technology may not have the track record or backward-compatibility with older legacy devices. The idea of purchasing an immature technology could greatly add to data-loss risk factor.

Once you have a well-documented data-protection strategy in place and daily backups are being carried out, performing documented testing of the system at regular intervals is the next step.

Benefits of automating software and hardware

Storing information and facilitating problem solving are the foremost goals of an IT system. Faster execution of IT system functions plays a key role in achieving these goals. When the execution of each backup job is faster, the time each job would take if done manually is much shorter. In other words, if you have 19 clients that need to be backed up, you can script these into one job versus 19 different jobs.

Unattended execution of backup jobs enables valuable personnel to complete other projects and brings more efficiency to the IT workload. The ability to schedule jobs in advance to run on downtime, when no one is around, gives an organization more flexibility to have tasks completed by a single administrator. Furthermore, reducing the number of mundane tasks allows higher-level tasks to be completed by qualified personnel.

Much of the equipment used for data protection can serve other purposes, enabling better use of company resources. Automation and scheduling jobs to run at night can enable the machines that run backups to be used for production purposes during normal business hours without a loss of CPU time or bandwidth.

A lower operating cost results from the smaller number of manual tasks that have to be completed. For example, automating jobs to run after hours eliminates the need to have employees watch each backup run. Rather than taking up a whole day with the sole task of running and watching backup jobs, a single person or small team can block a much smaller period of time to review the reports from the previous night.

Michael Chamberlain is professional services engineer at Workstation Solutions (www.worksta.com) in Amherst, NH.



This article was originally published on March 01, 2002