Using disk drives to supplement or replace tape as a backup target has been gaining in popularity over the past decade. However, in many cases removing or reducing the use of tape can have significant impact on operations and disaster recovery processes.
This article examines the business, application and budgetary impacts that using a disk- based backup system can have. We will also provide a set of evaluation points and strategies an IT organization of any size can use to determine how to decide on an optimal architecture and integrate disk-based backup into the IT environment.
Backup, recovery, archiving and replication are different issues. However, in many cases, the terms are used interchangeably even though they are each designed to play a different role. There are multiple approaches to data protection, and which data protection method to use is largely dependent upon business requirements. Administrators should ask several questions:
- Why am I doing backups? Is it for long-term disaster recovery (DR) purposes, or to protect against short-term data loss, or both?
- Should I make regular copies of data that do not change or are not accessed frequently?
- Do I have data on primary storage that should be archived?
Data protection entails making one or more copies of data. In order to qualify as part of a DR plan, data must be copied and stored off site, and it typically does not require rapid access. Disasters are rare, and represent a small portion of the need for data copies.
Most data recovery requests occur due to non-disasters, such as human errors or other inadvertent loss of data. In these cases, local copies of data are the preferable source for restoring data. Due to the faster, random access capability of disk storage, recovery from local disk is orders of magnitude faster than recovery from tape. Thus, recovery from most problems is best accomplished with a local copy of data from disk.
There are two key metrics used to describe data protection: recovery time objectives (RTO) and recovery point objectives (RPO). RTO is the amount of time it takes to restore data (e.g., one minute, 15 minutes, hours, etc.). And RPO is the point in time to which a recovery operation restores data (e.g,. five minutes ago, one hour, one day, etc.)
In order to provide DR and deliver low RTO and RPO times (high levels of service), companies typically use a mix of on-site and off-site data storage. Generally, the shorter the RTO or RPO level, the more the solution costs. Disk-based replication products can provide RTO and RPO of less than a second, but are relatively expensive. At the other end of the spectrum is backup to tape, with off-site vaulting. This is a relatively low-cost option, with the tradeoff being low levels of RTO and RPO.
There are typically four reasons for accessing a copy of data:
- Recovery from local data loss, where high RPO and RTO are critical. This scenario is common, and disk is typically used.
- Regulatory compliance, e-discovery or other requests for access to data. This is less common, and is often based on archive-to-disk architectures.
- Access to old data for reference. This is relatively uncommon, and archiving to tape is recommended.
- Recovery from a disaster, where RPO and RTO are important. This is relatively rare, and is often based on offsite disk or tape.
Table 1 summarizes data protection methods recommended in various circumstances.
There are multiple approaches to data protection; and which architecture to invest in is largely dependent upon corporate and IT requirements. Data protection typically entails making one or more copies of data. In order to qualify as part of a disaster recovery (DR) plan, data must be copied and stored off site, in an alternate location. In order to recover from non-disasters, such as human errors or other inadvertent loss of data, local copies of data are the preferable source for restoring data.
Backup To Tape
The ultimate goal of backing up is to have a copy of data when it is needed. Tape backup has been a useful tool for de cades, but RTO and RPO windows have decreased while tape media management has become increasingly complicated. As a result, restoring data from tapes is cumbersome, time consuming and fails to meet service level needs in many cases.
Tape backups are often misused as long-term, off-site data archives. Tapes have implicitly become used as an archive, with backup data seldom read from tapes. However, tapes remain a viable part of the overall data protection solution because of low cost and media transportability.
The Forgotten Archive
Before examining how data is protected using backups, it is important to examine how your organization uses archiving. The lines between on-line, near-line, archive, and backup can sometimes be blurry, although archives tend to have certain characteristics. It is important to understand the role that each type of storage should play. Most archive data is file- or object-based, and is needed to meet regulatory or corporate data preservation requirements.
The concept of the archive has been around for as long as the idea of backup. Since their inception, these closely related concepts have often been confused. Additionally, as the cost of managing data has risen faster than the cost of storing data, many companies have ignored archiving. As a result, these companies have chosen to never archive data rather than deal with the management and other issues associated with archiving. Consequently, infrequently-used data is the leading use of storage systems.
Historical evidence and recent research show that more than 80% of data is rarely accessed after it is 30 days old. By archiving this data and removing it from the primary backup set, backup operations can be streamlined regardless of the target.
Using backup to tape as an archive is an ineffective method of archiving data. By not explicitly moving data to a separate archive, data is retained on primary storage and subject to continual backup and management. Using off-line storage for archiving is also ineffective for e-discovery and other compliance regulation. Thus, many companies are using tape implicitly to archive data, without gaining the advantages of archiving.
Benefits Of Using Disk
Disk storage offers superior performance and reliability than tape in many cases. This is true for both disk targets and virtual tape library (VTL) systems that utilize disk. Another advantage of using disk storage as a backup media is the ability to easily track only differential changes to data. Although possible with tapes, the random access of disks allow delta changes to store multiple points in time efficiently while delivering high service levels.
Both disk-to-disk (D2D) backup targets and VTLs provide similar advantages over tape systems. D2D backup targets can be any type of storage, including Fibre Channel or iSCSI-connected block storage, or NFS or CIFS-connected file storage.
There are several concerns that effect adoption of D2D as a backup mechanism, including
- The higher cost per GB of disk vs. tape storage
- DR changes required to protect disk data through replication or other means
- The power requirements to maintain data on disk rather than tape
- High costs associated with replication and networking for DR
D2D vs. VTL
One of the early deployments of D2D backup was using a VTL, which uses disks to emulate tape. While this provides the ability to easily integrate into existing environments, many of the capabilities of disks are diminished by hiding their characteristics behind the tape interface of a VTL system.
Using D2D does not mean that tape has no role. Tape storage can still play an important role in the overall data protection strategy. Typically, tape is used as the secondary target for backup data in a disk to disk to tape (D2D2T) configuration, or for cases when off-site data protection is important for DR purposes.
A disadvantage with disk as a backup target in D2D or D2D2T scenarios is that backup applications are not able to use disk in the same manner as tapes. Many backup applications have only recently added support for disk as a backup target, with some charging an extra licensing fee. Additionally, many procedures that have become automated may need to be reviewed and revised, adding to the operational costs associated with this option.
Both D2D and VTL systems often include replication and data deduplication capabilities. Thus, in many cases, the only differences between a D2D target and a VTL are the operational impact and the interface used for storing backup copies of data.
Recently, several vendors have begun to offer specialized products for use as backup or D2D targets. One of the first vendors to create specialized systems was Data Domain. Other vendors, including Copan, NetApp, Overland and others released D2D backup target devices. More recently, HP and Dell have delivered systems designed as D2D backup devices. Systems designed as D2D targets typically support NAS interfaces. (See Table 2 for representative D2D products.)
Environments that have limited amounts of data, but want the convenience, speed and reliability of disk with the transportability of tape, should consider the new generation of removable disk drive products. The RDX line is the best known, available through several vendors and resellers, although there are other options as well. Another newer option for somewhat larger data sets is to utilize a disk library that uses removable drives. ProStor Systems, for example, has a line of disk libraries that uses RDX removable drives rather than tape drives.
The future Of D2D
Point-in- time copies and disk replication emerged around the same time as mainframe VTL systems. For open system environments, disk-based copies and replication were the only method of providing disk-based protection. As VTLs became available for open systems, many companies looked to use these systems as a lower cost alternative to disk copy and protection.
Backup software applications often did not support the use of disk systems, which also contributed to the rise of VTL systems. With disk targets widely supported, D2D is now a viable alternative to VTLs for providing higher performance levels than tape, at a lower cost than disk replication options. In mainframe environments, VTLs will likely remain popular for many years, due to the embedded tape handling mechanisms within mainframe OSes and applications.
Small to mid-size IT departments that do not require as much integration with physical tape are likely to implement D2D systems at a higher rate than VTL systems for the next several years. Over time, VTLs will be used predominantly in larger companies due to the requirement for operational consistency and the need for close integration with physical tape.
As disk archiving, backup, replication, deduplication and drive spin-down technologies are advanced, vendors will look to integrate these features into existing products, rather than supplying standalone solutions. Creating an on-disk delta-set snapshot, which is then expanded and sent to a backup target where it is de-duplicated, and then later reconstituted and sent to an online archive or tape device is an inefficient process.
Over time, the multitude of point products for disk backup and archive will likely begin to merge into a single product type. For mid-size and large environments, standalone point products may be preferred. In smaller environments, users will look for devices that can provide several capabilities, such as data deduplication, backup to disk, replication, online archiving and power savings.
There is no one correct answer for every environment, although disk should be a part of every IT organization,s data protection scheme. In general, disk should always be the first line of defense in protecting against loss of data from error or corruption. Tape (when used) should be used as the last line of defense, offering protection for DR and long-term archiving.
Disk based PIT copies are the best choice as the initial data protection mechanism. Specialized disk archive systems are also the appropriate place for on-line archive data, using self-protecting systems outside of primary data. For mid-sized and large environments, the second line of defense may be either a D2D target or a VTL.
Smaller companies, or those with less than 10TB of data, should investigate using disk as a backup target. Nearly all backup applications targeted for SMB environments have supported disk targets for several years. Using a network-based disk target will provide sufficient performance for nearly all small environments. For DR, off-site storage of data on tape or removable disks may be a good option. Another potential option for smaller companies is the use of a cloud storage backup service provider for off-site storage.
Midsized organizations with 100 to 1,000 people that typically have less than 250TB of data should utilize disk to augment their data protection policies. Many mid-sized companies that have older tape equipment may benefit from moving to disk exclusively. In particular, environments with existing network connectivity between two or more sites may be able to use the replication options of a D2D system to fulfill their DR requirements. Power savings may be a concern, and the cost of powering disk drives should be considered in the cost of a solution. Archiving data can help reduce the backup size, and lower power costs, by storing data on static media.
Large organizations, typically having 1PB or more of storage, should investigate using multiple technologies. Using disk or tape exclusively is unlikely to be the best answer for these environments due to the complex regulatory and business performance needs, combined with the cost associated with storing large amounts of data on disk, and the power and cooling associated with large data sets. Power use of archive and long-term backup sets is a significant issue for these environments. In many cases, the power use of maintaining archive data in on-line, primary storage can result in yearly power and cooling costs equivalent to the equipment cost.