Data protection still comes first

By Steve Kenniston

[Editor's Note]

The recent blackout that blanketed parts of the Northeast and Canada highlights just how important it is to have a tested business continuity/ disaster-recovery plan in place.

In this second part of our special series on business continuity, InfoStor takes a look at the practical side of planning for unknown disasters as well as at some of the specific technologies that can be implemented to improve your overall chances of fully and quickly restoring data in the event of an outage or a system failure.

The two articles that follow are written by two well-known industry analysts with specific insight into the current state of disaster preparedness among IT organizations and the technologies that can be implemented to help safeguard your future.

As always, we encourage you to e-mail us at heidib@pennwell.com to share your successes and/or failures as they relate to business continuity. Perhaps you have firsthand experience as a result of the blackout?

In the next issue, we'll continue with the series with advice from two integrators on how to map storage technologies to specific business needs and how to choose the business continuity plan that's right for your organization.

We hope you continue to find the content of this special series informative and instructional. – Heidi Biggar

The overriding sentiment among end users at this spring's Storage Networking World conference was delight that the vendor community hadn't introduced any new acronyms since the fall conference. After 18 months of a seemingly unending barrage of acronyms (SRM, SNM, ARM, ADM, etc.), IT professionals only wanted to hear about one thing: data protection.

The reality is that the number-one issue plaguing IT administrators today isn't automation or virtualization or iSCSI, but rather good old data protection. IT administrators want to know how to best align their data-protection processes to their business continuity objectives, and they want to know what their specific options are.

There are a number of technologies—some old, some new—that can be used to help meet your data-protection requirements, including disk-to-disk backup, virtual tape, snapshot, continuous data capture, and replication.

Backup is one thing; proving it is another

One of the main reasons backup is an issue is that there is no real way to prove that we captured the data for protection the way we intended. The only way to test if the data was captured was to perform a recovery. This is too time-consuming and impractical for busy, shrinking IT staffs. Today, companies such as Aptare, Bocada, Tek-Tools, and Vyant have reporting tools to ensure that the data that is supposed to be backed up was backed up. These technologies primarily work with traditional backup products, report on backup successes or failed jobs, and then provide reports on these successes and failures.

There is no point in protecting your data if you can't identify if you're successfully protecting it. Also, these technologies work in a heterogeneous backup environment. While a number of shops are trying to consolidate backup software, there is still a good deal of legacy software in most shops. These technologies help to manage backup reporting across most of the environment. One could argue as well that there is no point in spending money on new backup products or developing new processes until you know where your issues are. Once you have identified these issues, you need to correct them and possibly enhance them with some of the new technologies on the market.

Is disk the answer?

Disk-based backup is arguably one of the industry's fastest growing technology solutions. In fact, many see it—or an evolution of it—to be the next era of data protection. The latest discussion is around disk-based backup—specifically, how these technologies will evolve to become recovery-based (or recovery-focused) technologies over the next few years.

Driving this growth is the advent, commoditization, and emergence of new interfaces such as ATA-based disk arrays. The cost per megabyte of low-end ATA arrays is quickly approaching that of tape. And, as it does, it will open new doors for faster, more-reliable data protection. In fact, IT administrators are already discussing the notion of using both disk and tape for data-protection purposes and some have even implemented both.

Knowing which technology to use—and where—requires a closer look at the pros and cons of each. The primary issues IT administrators should look at are the cost of managing backups versus the importance of better data management and higher data availability. And the bottom-line question is, "Which disk-based data-protection schema best fits the business continuity objectives of your organization?"

Traditional backup/recovery technologies

What are the traditional vendors doing? Nearly all of the leading backup software vendors have stated that they support backup to disk, which basically means that IT administrators can now use ATA arrays as caching devices in front of their tape libraries.

However, whether you're moving data directly to tape or from disk to tape, you're still talking about backup in its purest form. Traditional backup applications are still responsible for moving the data to disk.

Using disk as a staging device has helped address some common IT issues (e.g., shrinking backup windows [performance] and data recovery), but only slightly. Data growth is still outpacing backup windows. The only answer: The paradigm of backup has to be changed! Over the past 18 months we have seen more and more intelligence move from the backup software into either the fabric or into arrays used for backup/recovery.

Virtual tape libraries

Virtual tape library (VTL) technologies have gotten a lot of ink lately from vendors such as Alacritus, BakBone, Diligent, IBM, Quantum, StorageTek, and SANgate. VTL software is designed to make a disk array look like a tape library and can be delivered as pure software (e.g., BakBone) or as an appliance with built-in software (e.g., Quantum).

The idea behind VTL is to help ease IT administrators over the psychological hurdle of moving from a disk- versus tape-based backup environment. These appliances fit seamlessly into the existing backup environments and they require few—if any—changes to the end-user backup infrastructure. Just redirect the backup application to the IP address of the array (versus the tape library) and you're off and running.

Data written to disk appears exactly as if it had been written to tape, so it can be easily and quickly cloned or moved to tape, without affecting media management schemas—which is crucial when it comes to recovery.

Disk backup/recovery targets

As you continue to move up the data-protection continuum, there are a couple of schools of thought on disk-based backup and recovery. The first sees disk-based backup as cache and sees no inherent value in putting the data on disk in a format that is like tape. After all, what would be the point? Doing so would cost money (you would need to buy some type of VTL software) and you would still have to move that same amount of data off to tape.

Click here to enlarge image


Additionally, depending on the backup software that you're running, you might not be able to clone the data off to tape. For example, data that has been written to a 120GB disk drive using backup software cannot be cloned to 60GB tapes. This can also cause problems with media management schemas.

Because of these issues, some IT administrators only care that their primary storage is up and running; they don't really care if it takes all day to back up their ATA arrays to tape. That said, a variety of ATA products can act as a target for backup, including products from EMC, Iomega, Network Appliance, Nexsan, StorageTek, and Snap.

Click here to enlarge image


The other school of thought looks at the recovery angle. This is not to say that the speed of recovery won't be increased by backing up to disk, because it will. However, recovery speed is limited by the speed of the backup application in recovering data from disk. The following technologies can help improve your data-protection index (see figure):

Snapshots—Snapshot technologies have been around for a long time; however, they are just now becoming mainstream. As an example, Network Appliance has integrated its SnapVault technology into its NearStore family of ATA arrays, as well as its filers. NetApp customers can now snap data off their existing filers and onto NearStore or filer arrays for near-instantaneous data recovery. They also have technology that allows them to move snapshots off heterogeneous open systems storage to a NearStore or other filer. Interesting to note as well, EMC has just announced snapshots on their new DMX line.

The software allows users to make up to 255 copies of this data available, which means flexibility in developing policies for rollback and data retrieval.

Additionally, a number of next-generation NAS appliances, which can increase volumes and file systems on-demand, have similar snapshot capabilities.

However, while snapshots are great for recovery (and, most importantly, for managing recovery), not too much has been done to educate end users about how they fit into the overall backup process. While you may think that all you need to do is back up the snapshot, there are other things that should be considered (e.g., the time it will take you to back up the data set—a data set that is often the same size as the original data set). So, in some cases, you gain availability, but you may still be left grappling with backup window issues.

It is also worth mentioning that some traditional backup vendors software can be used to make the snapshots. The two types of products are not competitive.

Next-generation snapshots—Technologies in this emerging area (e.g., FilesX's XpressRestore) capture snapshots at the block and file level, which means that data is captured underneath the database and is then moved to some type of inexpensive disk array or to an existing storage device. These types of snapshots can be mounted to recover data.

Because the technology allows you to capture multiple snapshots, you can move back and forth in time to mount the appropriate volume. Like traditional snapshot technologies, next-generation snapshots don't compete with traditional backup/recovery software products (see figure).

Next-generation backup/recovery—So, what about new backup-and-recovery technologies? Here we look at products from vendors such as Avamar, Storactive, and DataDomain. Some of these products are intended to displace existing backup products while others act as targets to existing backup applications.

Regardless, these technologies add a new level of intelligence to the backup/restore process. These products look at data as it passes through their systems. For example, if a particular "block" of data already lives within the system, the product determines that there is no need to write that block a second time.

Some vendors claim that as much as 80% of the data in a typical customer environment is static or unchanged. These types of products can potentially save a great deal of time, space, and management load by minimizing the amount of duplicate data that is potentially backed up.

Continuous capture—Continuous capture technology rates high on the data-protection index. While this is still an emerging market, a number of vendors (e.g., Revivio, StorageTek, TimeSpring, and Vyant) have already announced products in this space and more technologies are in development. Continuous capture technologies capture not only the data but also how the data was created and written to disk. As writes are saved to primary storage, a second write is "replicated" to a secondary target. In the event of a failure, end users "roll back" to data written prior to the failure and resume the application.

These types of technologies allow users to roll back databases second by second and then recover data near-instantaneously using a process that is similar to the "undo" key in Microsoft Word.

Some technologies in this space are pure software; others are hardware/software combinations. Some require agents on primary servers; others do not. These technologies are good for environments where recovery time has to be instantaneous. But they don't really offer a long-term archival component, which means you may still need to archive data to tape.

These technologies have been tested with traditional backup/restore products to support the archival process.

It's not about backup; it's about restore

Well, it's really about information life-cycle management, data protection, and business continuity all rolled into one.

ATA disk options and tight IT budgets are driving vendors to develop new, more-efficient (and, in some cases, cheaper) methods of protecting data.

Because the most important thing in the enterprise today is being able to recover data quickly when it is lost, IT is also looking for simpler ways to move data throughout the environment. As this paradigm continues to move forward, tape will begin to be used for what it was originally intended: archiving.

IT is now looking for tools that will help lead them down a better data-protection road—one that begins with identifying the types of data in the environment. Once this has been accomplished and the value of the data has been determined, IT will be able to implement the proper data-protection tools.

Steve Kenniston is a senior analyst with the Enterprise Storage Group (www.enterprisestoragegroup.com) in Milford, MA.

ESG is conducting a study on the evolution of data protection.If you are a storage end user who would like to share your experiences, or a vendor interested in participating in the study, please contact John McKnight, senior research analyst, at johnm@enterprisestorage group.com.

This article was originally published on September 01, 2003