New recovery techniques put the emphasis on restore, as opposed to backup, operations.
By Ron Levine
Your enterprise-wide transactional database has suffered a catastrophic event and you're now facing a 17- to 25-hour system rebuild process from the backup tapes. It's a nightmare scenario that would make any storage administrator operating in a critical uptime processing environment cringe. After all, this kind of downtime could put the entire business at risk.
You've done everything right: You have a state-of-the-art storage system, dedicated backup devices, and automated backup software, and you've followed all the best data-protection practices. So why does the company find itself in this business continuance dilemma? Because for years data-protection solutions have focused on the backup side of the equation, while largely treating the recovery phase as an afterthought. That may be changing, thanks to new backup/restore software and technologies based on disk, as opposed to tape, recovery systems.
These technologies focus on the recovery phase and enable rapid, verified restoration of backup data in minutes—regardless of the size of the database or file system. Whether restoring a terabyte or a petabyte of data, recovery can take less than 20 minutes. In critical environments such as transaction processing, the difference between 20 minutes and 25 hours could mean the survivability of the business.
(Editor's note: The Enterprise Storage Group consulting firm, in Milford, MA, categorizes these new approaches to database recovery as "disk-based recovery technologies" and includes vendors such as Revivio and Vyant Technologies in the category. Other vendors are expected to enter the market over the next few months. For more information, see "Understanding disk-to-disk backup," InfoStor, February 2003, p. 26.)
Some disk-based database recovery technologies can enable a fresh start from any previous point-in-time within a few minutes. This is possible because the restore technology does not rebuild full volumes of data, nor does it apply application logs for data reconstruction. Instead, unlike traditional backup tools or snapshots that rebuild data to pre-defined points-in-time, it employs an "undo" procedure to simply "rewind" rather than rebuild the affected data. The benefit is that, following a data corruption event or full-scale disaster, it takes only a few mouse clicks to "roll back" the affected data to any point-in-time before the loss or corruption occurred; operations can continue from that point forward. Any application can be automatically restarted with this process within a few minutes, regardless of application size.
This approach combines fast, reliable backup with quick and reliable recovery—a combination that has been missing from some traditional backup solutions.
How does it work? In the case of Vyant's technology, the process runs across two hosts (an application server and a backup server) and includes four agent components (intercept, backup, archive, and recovery), which control data capture, movement, archiving, and restores. (For details, see sidebar, "How it works," p. 36.)
This approach addresses time-critical recovery needs. It is a continuous, non-intrusive hot backup process that produces a constant running record of application activity in real time, without degrading the running applications. Recovery is not dependent upon a point-in-time snapshot, but on a consistently rolling "movie" of data that leaves no gaps of unsaved information. Tags applied during backup make it possible to choose the target "rollback point" by date and time. The user specifies how much online storage is designated for saves.
When data recovery is necessary, the user simply selects the point-in-time for the restore and the software transparently pulls a subset of data off disk or tape as needed to reconstruct the point-in-time. The automatic verification process built into the data-recovery phase ensures valid data. Data recovery from system crashes, data corruption, or any type of data failure requires minutes, as opposed to hours with traditional restore techniques.
The recovery process is straightforward: Administrators can move incrementally through data "time" to a point prior to data corruption and can simply "undo" the data movie back to a point before corruption occurred, thus eliminating the need to rebuild the data structure. If the initial "point-of-awareness" is estimated incorrectly, the user can modify the target rollback to an earlier time and, in only a few minutes, "undo" once again.
This process, which can run with existing backup software applications, may shift the focus from backup to data recovery and business continuance.
Ron Levine is a freelance writer in Carpinteria, CA.
How it works
1. In the case of Vyant's technology, an intercept agent provides real-time continuous backup, captures incremental block-level data changes synchronously in transit to storage, and packages them for asynchronous transport to the backup server. The placement of the agent in the I/O stream allows for data capture as it is transported to the logical or physical device and provides for application and storage independence. It also supports raw devices.
2. The backup agent handles data journaling and journals a history of the data set before it applies changes to the replicated copy of data.
3. An archive agent records data blocks as forward and reverse increments, providing the base for recovery and facilitating restoration to any point in time, independent of storage device or application. The archive agent maintains a journal online and queues it for immediate recovery. This technique eliminates the need to restore entire volumes and significantly reduces time and bandwidth requirements for data recovery.
4. A recovery agent determines the required forward and reverse increments for use during recovery and applies the reverse increments to a virtual image of the application data or directly to production storage, allowing restoration to be completed in just minutes.
The recovery agent facilitates data recovery (either full or partial) or data repair by analyzing the data set on the backup server record-by-record to determine the valid, general restore point. The same rollback sequence is applied to the application server to resynchronize it at the same restore point as the backup version. Affected tables and records are extracted and then inserted into the running application, which remains operational for tasks not accessing the affected records.