An evaluation of various data-protection solutions shows that IT organizations should consider a combination of technologies.
By Sean Derrington and Phil Goodwin
Part one of this two-part series differentiates replication technologies and disk-to-disk-to-tape (D2D2T) technologies. When speaking to storage managers about D2D2T products, we're often asked, "Can't I just use the replication software that I already own?" In fact, D2D2T and replication software solve two entirely different problems.
This article will classify and clarify various data-protection and recovery options and highlight where the various approaches can be used. In part two, we'll delve into more specifics about various D2D2T vendors and products.
Determining the problem
At first glance, it would appear that D2D2T technologies are merely a different form of data replication. However, these technologies really solve a different problem from replication software. (Note: We group host-based replication products, such as Veritas' Volume Replicator, in the same category as disk controller-based products because they perform the same function, albeit using different methodologies.)
Replication technologies fundamentally provide a solution for business continuance. To that extent, replication solves the "recovery" portion of the backup-and-recovery issue. Systems protected by a continuum of replication technologies (described below) can recover from almost any data loss nearly 10 times faster than with a tape-based recovery approach. However, best practices still dictate that organizations back up their data to tape for off-site vaulting and long-term retention. In this case, replication does not improve backup performance, although it does reduce reliance on tape for recovery. Instead, the backup appliance solution can help ensure backup success as well as reduce the manual-intensive nature of backup by nearly 30%.
IT organizations would be hard-pressed to identify a storage vendor that does not offer some type of data-recovery product; it's even offered as an option with some operating systems (e.g., Microsoft's Windows Storage Server 2003). No matter what marketing name is given to the product, the underlying concept is the same: Create a copy of the data, apply some type of time stamp, and move it from one location to another.
Choosing the most appropriate data-protection technology, or technologies, depends largely on an IT organization's recovery point objective (RPO) and recovery time objective (RTO).
The key is to view this as an information protection continuum (see figure) that solves data recovery requirements with a combination of the technologies outlined in this article. IT organizations can "package" technologies to support the tiered storage services delivered to applications and the business.
The following descriptions are a starting point and are admittedly generalizations. It's important to note that product capabilities vary widely, which we'll address in part two of this series.
We classify replication software into three categories: snapshots, local mirrors, and remote mirrors. Regardless of where the replication "intelligence" resides (e.g., in the server, on an intelligent switch, dedicated server appliance in the data path, or in the storage subsystem) the goals are the same.
Snapshots can be taken within a storage subsystem or between subsystems. Snapshots use a copy-on-write technique where at a specified time ("checkpoint") a snapshot of the application is taken and from that point forward, any updates or writes to the storage subsystem are written to separate LUNs.
Local mirrors are physical full volume copies of the data within a storage subsystem. Unlike snapshots (which have a replication efficiency of less than 100%), local mirrors are full volume copies and have a replication efficiency of 100% (e.g., a 2TB database protected by RAID 1 with one local mirror will require a total of 6TB of storage because it requires 2TB for the original image plus 2TB for RAID 1 and another 2TB for the local mirror).
Remote mirrors are also physical full volume copies of data, but they provide the option for increased distances (snapshots can also be remote). Remote mirrors can support distances anywhere between two meters and thousands of kilometers. Moreover, remote mirrors can support synchronous (guaranteed writes) and asynchronous (there will be data loss in the event of a failure) replication, as well as combinations in between (e.g., time-delayed or semi-synchronous replication).
Generally, storage subsystem-based replication is the most robust approach and typically is used to support an IT organization's most critical applications. Appliance-based solutions and replication solutions that reside at the host/application level are typically used for less-critical applications where tight integration with server clustering software, backup/recovery software, rapid synchronized fail-over, and rapid fail-back are less important. Remote mirroring provides the greatest protection against site failures (incorporating business continuity and disaster-recovery requirements), but may not provide the quickest restore in the event of a local hardware failure (unless a local mirror or snapshot is used in addition to the remote mirror).
Remote distances are also an important consideration when evaluating synchronous versus asynchronous solutions. There is a tradeoff between performance and acceptable data loss. No vendor can avoid the limitations of physics. The speed of light travels 186 miles/ms in a vacuum (for planning purposes use 125 miles/ms in fiber); mechanical disk seek time is typically 5ms; and round-trip synchronous replication beyond 100 miles will most likely significantly hinder application performance (or the corporate checkbook).
Continuous data-protection software
Continuous data protection (CDP) is a relatively new approach that uses an appliance that "sits next to" the primary storage, on the same storage network, and is the target for snapshots of an application. The CDP approach provides continuous protection for an application by allowing for greater retention and continuity (i.e., providing recovery to any given second over a seven-day period) than traditional snapshots or local mirrors. The goal of CDP is to minimize the recovery point objective (RPO) and recovery time objective (RTO).
Although CDP solutions include disk storage (often Serial ATA) they are not designed to be primary disk storage for applications (often due to host/device interconnect bandwidth limitations and limited storage management capabilities). Although a CDP solution should be viewed as a premium service (e.g., tier 1), it does not alleviate the requirement for tape-based backup/recovery services (you still have to consider data archival), and it may be implemented in addition to a remote mirroring data-protection scheme (protection against site failure).
Key evaluation criteria for CDP solutions should include integration with primary storage devices and backup/recovery hardware and software, use of host-based agents, resynchronization/fail-back capabilities (i.e., delta block comparisons vs. full volume rewrites), operating systems supported, application performance during a recovery/resynchronization, and ease of management.
Most backup/recovery software vendors now offer the option to use a combination of disk and tape to help IT organizations reduce exposure to failed applications. This option enables the ability to back up an application to a higher-performing intermediate disk subsystem, and then once the backup is complete, to back up the same data to a tape library. This second information movement does not require application/server resources.
In addition to reducing the backup time, IT organizations can perform restores from the intermediate disk using the backup/recovery software. The ability to recover an application from the intermediate disk is dependent upon the intermediate retention period.
The question for IT organizations is not if backup/recovery software should be used, but rather, whether disk should be used in addition to tape to accelerate application restores, and what the archival policies should be for the storage tiers.
Disk libraries are a new product category created largely by EMC with its introduction of the CLARiiON Disk Library, which shares many similarities with competing products that are marketed differently (including, for example, Overland Storage's REO 1000/4000). The market segmentation and distinction between a disk library and some other D2D2T solutions is that disk libraries use tape emulation software. The backup/recovery software "thinks" it is writing information from the application server to a tape library, but in reality is writing information from the application server to a disk subsystem. The information is then replicated from the disk library to a traditional tape library for long-term archival.
Disk libraries typically use Serial ATA drives and provide a lower-cost option (compared to Fibre Channel) and greater capacity for maintaining historical application information for long periods of time. Similar to CDP solutions, disk libraries cannot be used as primary storage for applications (because they communicate via tape emulation and not traditional SCSI-3 commands), but disk libraries still provide a relatively seamless solution for D2D2T backup and recovery.
Secondary ('nearline') storage
Secondary, or "nearline," storage has seen significant acceptance over the past two years. These devices typically use either ATA or Serial ATA (SATA) disk drives. Examples include Data Domain's DD 200 Restorer, Network Appliance's NearStor, and StorageTek's BladeStor.
Although the I/O performance of ATA/SATA drives is less than that of Fibre Channel drives, capacity per drive is greater and prices are dramatically lower. A primary difference between secondary disk storage and disk libraries is the lack of tape emulation capabilities.
Secondary storage devices can effectively be used as replication targets (either locally or remotely) and provide a reliable level of protection for applications. One successful use of secondary storage devices is implementing remote snapshot software (e.g., Network Appliance's SnapMirror) for remote locations (where there is minimal or no IT staff) to a central location. The secondary disk storage (the targets from the remote sites) is then backed up to tape. While this may solve backup/restore staffing problems at remote locations, the challenge is in the recovery (either partial or full) at remote locations. It's also important to note that secondary storage need not use ATA/SATA disks; more-expensive primary storage devices can be used as secondary storage targets for high-speed recovery (and often are required for remote mirroring solutions).
One slight difference between some nearline solutions and secondary disk storage solutions is that nearline solutions typically use a hierarchical storage management (HSM) approach. HSM software moves data (versus creating a copy) from primary storage to secondary storage and then possibly tertiary storage (i.e., tape) based on policies and thresholds (e.g., age of access). Nearline solutions should not be viewed as a component within the data-protection continuum, but rather as a primary storage alternative aimed at reducing up-front expenses (and possibly ongoing operational expenses).
Tape libraries use sequential access devices and can offer high capacity and fast transfer rates, but they cannot compete with the random access capabilities of disk arrays. Even enterprise class tape transports (e.g., IBM 3592, STK 9840/9940) can't compete with the restore times of disk-based arrays. What tape libraries do compete on, and will continue to, is cost per gigabyte.
Tape is typically a fraction of the cost of disk-based storage solutions. Tape libraries are a critical component for any enterprise data-protection scheme, particularly for backup/recovery operations that involve off-site storage.
Virtual tape libraries
Virtual tape libraries (VTLs) are yet another market segmentation that uses a combination of disk and tape. VTLs have been around for decades in mainframe environments, but more recently were introduced for non-mainframe environments. VTLs use tape emulation software for backup to disk, and then later to tape.
On mainframes, VTLs were designed to increase tape utilization and reduce the number of tape devices required. However, they operated under the fundamental premise that tape devices are accessed as sequential devices in a start/stop mode; however, backup/recovery is a streaming operation. We recommend that IT organizations avoid this hybrid approach in non-mainframe environments because a combination of replication and traditional backup/recovery provides more-comprehensive data protection.
Most organizations will discover that replication products and backup/recovery devices, regardless of what they are called, are "and" technologies rather than "or" technologies. It is mainly a matter of applying the appropriate solution to a specific problem. With regard to backup/recovery devices, the decision is then between tape emulation and appliances. Here again, the solution depends upon the problem-and that's the topic for the next article in this series.
Phil Goodwin is president of Diogenes Analytical Laboratories (www.diogeneslab.com), an IT buyer's service that provides product evaluations free from vendor influence. Sean Derrington was formerly a principal at Derrington Consulting and now works at Veritas.