A new model for backup capacity planning
By Steve Richardson
The recent explosion in capacity requirements has fundamentally changed the way users store and protect critical data. Driven by a new generation of multimedia applications and massive downloads from the Internet, automated tape storage solutions have become critical components of overall data management schemes.
But the vast amount of information at risk threatens to overwhelm single-cartridge tape solutions--even those boasting dozens of gigabytes of capacity.
As the move to automated tape storage has accelerated, users have been left to select the appropriate device and capacity. Should the library capacity match the system disk capacity (the 1:1 model)? Would it be better to have a backup device that has twice the storage capacity of the network? Does it make sense to have 5 to 10 times more capacity to ensure a sufficient growth path? The challenge is to define a storage model that meets individual data storage requirements.
Many users still adhere to the straight 1:1 backup philosophy, operating under the principle that the total available capacity of the backup device--whether it`s an autoloader or a single-cartridge unit--only needs to be equal to total disk capacity. Using this thinking, users with 15GB or 20GB of data to be secured may be tempted to look for drive-only solutions, such as DLT or AIT, which offer 25GB (uncompressed) or more of storage capacity per cartridge. Given the need to improve operator efficiencies and accommodate future system growth, this may be a mistake.
In some cases, the 1:1 backup model no longer applies. New models exist that provide users with guidelines to effectively match their capacity needs to the capacity of tape libraries.
In this new era of automated backup, there is no "one-size-fits-all" approach. An alternative approach is to set minimum and maximum guidelines: the minimum ratio is set at three times the total backup data set, the maximum at six times total data. Anything less than the minimum may limit the benefits of an automated solution and lead to premature hardware replacement, while purchasing capacity beyond the maximum ratio may only be cost effective for specialized application requirements.
These new capacity ratio models consider changes in typical tape usage patterns and advances in backup technology. Instead of a straight dump of the complete data set to tape, improvements in backup and data management software have enabled more sophisticated backup practices that can automate the backup of selected files and allow administrators to prioritize specific data sets based on the cost of reconstructing a variety of parameters. For example, accounting and customer order data may be designated to be backed up in full every day, while less critical information is included in incremental backup operations only when the data is modified and backed up in full once a week or less. In addition, each data set may have a different retention time--that is, the length of time it is retained before it is recycled and made available for more backups.
Five factors can help determine how much library capacity is required for any given disk capacity--each of which is weighted to determine the optimal library capacity for a specified disk capacity:
- The size of a full backup of the data set
- The size and frequency of incremental backups
- The number of backup cycles maintained within the library
- The methods used (if any) to maximize the efficiency of tape cartridge
- The growth rate of data in primary storage
Factors 1 and 2 (full and incremental backup criteria) are actually straightforward and are derived from common backup practices in standard weekly routines. In a common backup rotation, the full backup is performed weekly on the complete data set, including server data and data stored on connected workstations. The incremental factor is based on performing daily backups of a subset of the total data load--typically only files that have been changed since the last backup--with the minimum recommended ratio, assuming four daily incremental operations at 10% of the data set.
Incremental backups are performed each weekday with a full backup each Friday. The maximum ratio assumes a more robust 24x7 environment with six incremental backups at 20% of the data set each week, in addition to the weekly full backup.
As for the third factor, there is typically only one backup cycle kept available for ready access. However, in some applications, such as engineering, administrators may need to refer to previous versions of files over several weeks or months. In such cases, the backup tapes need to be held longer before recycling.
The fourth factor, or inefficient tape cartridge usage, does not draw much attention, but it can significantly affect library capacity. According to a study of multi-gigabyte tape systems by Strategic Research Corp., a market research firm in Santa Barbara, CA, the average tape volume size is just 255MB.
Seventy-nine percent of the tape sets were under 400MB. When using 20GB (native) DLT4000 drives, this translates to only 20% use of available tape capacity. This fact, says Strategic Research, can be attributed to the longstanding practice of storing only a single save-set per tape (1:1 backup).
But today`s automated tape libraries and loaders can store at least a week, and in some cases much more, of backup sessions without operator intervention. Sophisticated backup software places multiple backup sessions on a single tape without affecting the ability to recover and restore data. In this automated environment, it makes little sense to store only a single backup session on a tape.
The fifth factor is future growth. The minimum ratio is 1.0x--that is, the initial autoloader purchase should allow enough headroom for 100% growth. This may sound high, but data growth is an increasingly common problem. The maximum recommended ratio for capacity growth is set at 2.0x, which allows organizations to triple their data loads with current hardware.
These capacity planning guidelines are suggested parameters. Using these criteria as a framework will enable organizations to adapt and modify the basic principles for determining backup capacity to meet their own specific requirements.
The new model at work
A user has a 20GB data set. According to the model outlined above, the user needs automated backup solution with 60GB to 120GB of capacity. It is also important to note that the general guidelines of three times to six times the backup data set may be too low for users who need to retain more backup cycles in the library. And since no conventional tape technologies offer capacity in this range, the solution--even for the 20GB case--is an autoloader.
In the bargain, users get the additional benefits of labor cost savings by automating the backup process. These savings, even for a small network, can easily exceed $20,000 a year--several times the price of an autoloader. The savings are derived from automated network backup and media rotation processes and from user-initiated file restorations. What`s more, an autoloader with data management software can help users make more efficient use of tape cartridges, while still rotating tapes in a secure fashion.
A suggested model for backup capacity planning is based on a minimum ratio of three times the total backup data set and a maximum ratio of six times the total data.
Steve Richardson is vice president of marketing at Overland Data, in San Diego, CA.