The devilish details of disk-to-disk backup

Posted on April 01, 2005

RssImageAltText

Extensive testing of a variety of disk-based backup/recovery products reveals that ‘to D2D or to D2D2T’ is not the only question.

By Phil Goodwin

Disk-to-disk (D2D) technology offers tremendous promise for improving backup-and-recovery operations. D2D appliances can reduce reliance on the backup window, significantly increase storage administrators’ productivity, and reduce the risks associated with failed backup jobs.

Click here to enlarge image

In this article, we group D2D and D2D2T (disk-to-disk-to-tape) together, even though there are technical differences. Specifically, D2D appliances have no mechanism to separately move data from the D2D device onto tape. With D2D, data movement to tape is an entirely separate operation using the backup server and the tape device with no D2D device involvement. (Of course, it is also possible for the backup software to use the D2D appliance as the data source where tape is the target.)

D2D2T products, in contrast, have “back-end” connections (usually Fibre Channel) to a SAN or directly to tape drive/library devices. Data can be moved from the D2D2T device either under backup software control or via utilities provided by the D2D2T vendor. When an all-encompassing term is sufficient, we will use D2D in this article. When referring to D2D2T devices specifically, we will use that term.

Despite the significant potential benefits of D2D, no one will be surprised to hear that it’s not a panacea for all backup issues. IT buyers will hear a variety of vendor claims regarding lower costs, faster recovery, the elimination of tape, and more. Often, there is truth to the claims, yet the benefits may be more diluted than thought or come with tradeoffs that must be considered.

Recently, Diogenes Analytical Laboratories tested eight different D2D and D2D2T devices from seven manufacturers. We were (pleasantly) surprised at the significant differences between the products and methodologies, because such diversity provides IT buyers with many choices. The downside of this diverse selection is a market full of complexity, claims, and counter-claims.

Whenever a vendor claims to offer “lower TCO,” the immediate rejoinder should be, “Relative to what?” Certainly, in most cases “do nothing” (a viable option) yields the lowest cost of ownership of all. There’s no acquisition cost, training costs, or professional services costs when you “do nothing.” Of course, “do nothing” may result in higher risk to the organization or higher operating costs, but those are off-budget or at least non-capital items. In many cases, the lower TCO is compared to other similar devices, alternative technologies, or perhaps a “worst-practice” organization. In other words, the savings is usually a slower rate of growth rather than a reduction in absolute dollars.

In the case of D2D, a claim of lower TCO is difficult to support in absolute dollars. Indeed, we believe that D2D is an “and” technology, not an “or” technology. It does not truly replace tape, although it may eliminate the need for daily incremental tapes that remain on-site. Most medium- and enterprise-class data centers will spend hundreds of thousands of dollars as an entry point to D2D. Certainly, D2D2T (e.g., virtual tape libraries) can significantly improve tape-packing rates (i.e., the percent of a tape cartridge filled with data). This improvement can result in far fewer tapes to manage, but probably will not offset the cost of the D2D2T system. Also, adding such devices to a SAN will require more Fibre Channel ports, cabling, etc. D2D is an important technology that all data centers should at least consider, but don’t expect your IT budget to go down when you implement D2D.

We have seen credible benchmark data indicating that D2D backups can be 10 times faster than tape. Taking this claim at face value, however, would lead the IT buyer to erroneously conclude that such speeds are a by-product of the technology. Certainly, comparing D2D to a poorly tuned or inadequate tape system could yield an astonishingly favorable comparison. However, IT organizations would be well advised to look much deeper into the performance issue.

During our testing of various D2D products, benchmarking performance was not part of the evaluation. Benchmarks rarely represent real-world applications. Nevertheless, we occasionally checked system throughput. In one instance, we found that the data was being transferred at 35MBps. That doesn’t sound bad until you realize that it is about the same speed as a single LTO-2 tape drive.

The lesson to be learned from this example is that tuning a D2D environment is similar to, and as important as, tuning a tape environment. Storage administrators still need to optimize each link in the backup chain. These links include

  • Backup/media server: If the processor consumption routinely exceeds 80% during backup operations, then the processor/memory should be upgraded.
  • Network bandwidth: The network link (whether LAN or SAN) must be able to sustain throughput equal to the necessary data rate.
  • Target device channel throughput: The target device (whether disk or tape) must be able to accept data at the rate it is fed.
  • Source device output: The source array must be able to feed data to the backup server at a speed equal to the other components.

These four elements are the most common sources for backup performance problems. If any one of them is inadequate, then it becomes the bottleneck. In our case, we determined that the problem was often our backup server performance, not the D2D devices.

In our tests, where system throughput was equal to tape performance we still realized benefits from D2D technology. For example, single file restores were almost instantaneous because there was no need to find the tape, load the tape, or scan the tape. Throughput is mainly an issue when moving large volumes of data, such as restoring a database or entire system. Moreover, there’s no such thing as the “shoe-shining” or “back-hitch” problems with D2D that tape devices can experience.

Pros and cons of D2D

D2D technology is usually implemented using appliances. These appliances are simply RAID arrays using lower-cost ATA or Serial ATA (SATA) disk drives. In most cases, the appliance does not emulate a tape device. The appliances have specialized software for managing the system and moving data. These appliances almost universally support off-the-shelf backup/recovery software applications to accept data from the primary storage source. In some cases, the devices also have replication software to move data from appliance to appliance, including local and remote copies.

Because D2D appliances serve as a target for standard backup/recovery jobs, they are generally considered “non-disruptive.” This is mostly, but not entirely, true. Certainly, scheduling a backup/recovery job to a D2D device is no more difficult than it is to tape. However, storage administrators must schedule duplicate jobs to the D2D and tape devices if they want to move a tape copy off-site. In any event, however, all copies of data created using backup/recovery applications are tracked and managed in the backup/recovery catalog, greatly simplifying the task of managing many data images. In fact, the daily “care and feeding” of D2D devices is minimal once the initial setup and configuration is complete.

It is the duplicate job setup and execution that is the main downside to D2D backup. IT organizations have the option of either writing duplicate data streams to disk and tape simultaneously, or first writing to disk and subsequently to tape. If the data stream is duplicated, then the backup/media server must be sized to handle both jobs. Additionally, twice as much bandwidth will be needed at peak operation. If the disk and tape copies are created sequentially, then the processing window is doubled and the off-site tape movement is delayed for some period of time, somewhat increasing the risk of data loss.

Pros and cons of D2D2T

Given that D2D2T includes the ability to independently move data to tape, D2D2T offers a number of alternatives not available with D2D alone. Moreover, nearly all D2D2T devices emulate tape devices both at the robot (library) level as well as the tape drive format level. As a result, the most basic operation of D2D2T (moving data to the device) is as non-disruptive as D2D backup. In fact, the tape emulators make the backup/recovery application “think” that it’s accessing a normal tape device.

To some extent, the pros and cons of D2D2T are the converse of those of D2D. Having a separate mechanism and process for moving data from a disk device to tape eliminates any additional load on the backup/media server. In some cases, data can begin spooling off to tape from the D2D2T device even before the entire backup set has been received from the primary source. As a result, the latency between disk copy completion and tape copy completion is minimized, again without impact to the server or network. In addition, D2D2T devices do permit control of all data movement and data images to be managed by backup/recovery applications.

Of course, the additional functionality of D2D2T devices comes at a price, and that price is primarily complexity. If the storage administrator chooses to implement some of the advanced functionality of D2D2T systems, this functionality is often outside the control of the backup/recovery application. For example, when the device moves data to tape using its own utilities, the backup/recovery application has no knowledge of this secondary data image. Risk increases because should the D2D2T device fail, then the image on the device is obviously unavailable. Plus, the backup/recovery software has no knowledge of, and therefore cannot access, the version on tape. As a result, IT organizations are well advised to incorporate redundancy in the D2D2T layer to ensure data availability.

Even in cases where the backup/recovery software is aware of both the disk and tape copies, problems can arise. Some D2D2T devices support virtual bar codes, which can be either a 1:1 correspondence with the physical tape bar code or a map of logical to physical bar codes. In the first scenario, problems can occur if the physical tape breaks. In this case, the operator must create an entirely new image on both systems, or else peel the bar-code label from the bad tape and put it onto a new tape. In the bar-code mapping scenario, the organization is again at risk if the D2D2T device fails, because the backup/recovery software will not be able to find the mapped physical bar code.

The final issue surrounding D2D2T is the potential necessity of restoring data from tape to the D2D2T device before it can be restored to primary disk. Again, this creates a point of failure and can increase the time needed to complete a restore operation. It is possible to re-cable around the failed device, but this takes time and can be fraught with error.

Conclusion

There is no “right” answer to the question of whether D2D or D2D2T is “better.” The correct solution depends on the requirements of the organization and how well a particular solution matches those requirements. Companies considering a disk-based backup solution will be pleased to discover a surprisingly wide range of products and capabilities. D2D backup/recovery generally offers greater simplicity and certainty, but at the cost of functional flexibility. D2D2T backup may offer a more diverse set of functionality, but at the cost of additional complexity and management overhead. The choice is yours.

Phil Goodwin is president of Diogenes Analytical Laboratories (www.diogeneslab.com), an IT consulting and product evaluation firm in Boulder, CO.

This article is based on extensive testing of eight D2D and D2D2T products from seven vendors (ADIC, Alacritus, Data Domain, Diligent, EMC, Neartek, and Overland Storage), conducted by Diogenes Analytical Laboratories in Feb. and March. For the full report, which includes comparative analysis, go to www.diogeneslab.com/.D2Dbuyers.htm.

Originally published on .

Comment and Contribute
(Maximum characters: 1200). You have
characters left.

InfoStor Article Categories:

SAN - Storage Area Network   Disk Arrays
NAS - Network Attached Storage   Storage Blogs
Storage Management   Archived Issues
Backup and Recovery   Data Storage Archives