EDV Solves Offsite Data Backup Problems
Electronic data vaulting may be a less costly and cumbersome approach to backing up data at offsite locations.
By Richard Ohran and Wally Marsden
Commonly used methods of moving data from a company`s primary operating facility to a secure offsite location may no longer adequately protect a company`s data. In many cases, the use and storage of critical corporate data has become the life blood of businesses. A week`s worth of data can take months to recover, while a complete data loss can have catastrophic results. According to a recent University of Texas study, only 6% of companies that experience a catastrophic data loss survive, 43% never reopen, and 51% close within two years.
With such odds, nightly tape backups are no longer adequate, leaving as much as an entire day`s worth of data transactions at risk. Because tape backup may not be automated, it is prone to data corruption by human hands. Also, to ensure the integrity of the data, it should be checked.
And, lastly, in the event of a disaster (either physical or logical), the consequences may even be worse if conscientious procedures to move backup copies to an offsite or remote vault have not been followed. Copies of data that are not moved offsite may also be destroyed.
So, how can you protect your company`s data? The ideal situation would be to continuously capture data at the primary operating site and to automatically move it to a remote site. The data image should be taken and transferred in such a way that there is a complete audit trail of all updates.
In other words, as soon as new data is generated locally within a company, a copy of the data should be electronically transmitted--without overwriting previously collected data--to a safe remote site. As the data is received, it should be stored in a "tagged" format, which preserves the data, and stamped with the time of creation. Since the updates overwrite previous data contents, data can be recreated on demand for any point in time.
This approach would enable businesses to resume operations with a "known good" data set. Any image of the company`s data could be reconstructed--and because the data is stored at a remote site, it would not be subject to any of the potential dangers at the primary site.
The value of the above system is the way in which it deals with the problems of recovering data when the primary data set fails or is damaged. Because updates typically occur as a coordinated sequence of data changes, it is often not possible to simply repair or restart a system and continue with the last data written when an interruption occurs at the primary data center.
In such situations, only the first part of a sequence of updates occurs, which corrupts the data, leaving it in an unusable state. When faced with data corruption of this sort, businesses are forced to resume data transactions from some point in the past when the data set was known to be good.
The ideal solution would allow a company to revert to the most recent point in time (perhaps just seconds before the crash) when the data was good-- rather than to the previous night`s backup tape. If a known good data set had not been saved, the company would be forced to manually back out transactions and re-enter transactions from a "guessed good" data set.
As an example, consider the following common event: a file is being extended to hold additional information. To perform this operation, a free sector is chosen from the list of unused sectors on a disk, and its address is linked to those associated with the file that is being extended. Then, in a separate but coordinated step, the unused sector list is overwritten to reflect the removal of the selected sector. If the first step (the linking of the unused sector to a file) is performed and the second step (the removal of the sector from the unused sector list) is not, the same free sector may be given to another file the next time the operation is performed. These files are "cross-linked" and are subject to corruption.
When this occurs, the data may be in such a state of corruption that it is entirely unusable, and a restoration operation is required to recover data from a previous backup. Such an operation restores the data to the state of the last backup, which may have taken place days or even weeks before. Since the transactions since the last backup often contain vital information, the data must be recovered. In many cases, however, recovering several hours worth of data requires far greater effort than restoring the previous night`s data. Hence, a solution that minimizes the amount of lost data is invaluable.
Ideally, in the event of a system crash, a recorded sequence of data updates would be regressed one at a time until the corrupted data is reconstructed to the most recent coherent or "known good" state. Then, steps would be taken to properly handle the transactions that are in process at the time that the system failed. This process would reduce the amount of data recovery to seconds rather than hours or days.
However . . .
This solution is neither possible nor economical to implement with today`s technology. The goal of electronically moving every new piece of data immediately to a remotely located storage site is impossible due to communication lines that are either too expensive to implement or too slow.
And, even if all data could be moved to a remote site, the volume would quickly overwhelm available storage media, and the logistics of tracking the data states would be cumbersome. Basic economics dictates a need to reduce the requirements for communication rates and storage space.
There is, however, a technique that reduces the amount of data that must be transmitted and stored, while providing the desired level of backup protection. This technique is sometimes referred to as data vaulting. If every update is transmitted to and stored at the remote site, the backup system can arbitrarily revert to any state of data at the primary site. While this may be desirable in theory, in practice you could get along with much less stringent requirements.
For example, a company may be adequately protected if its backup system can revert to data states at quarter-hour intervals (or other intervals as deemed necessary), as long as the data captured at each interval is reliably coherent.
These coherent states are called "known good" states. If you can capture the data in "known good" states that represent checkpoints, you can electronically transmit "known good" data to the offsite location.
If the system crashed, only a quarter-hour`s worth of transactions would need to be reconstructed. Accepting this limitation, the backup system could ignore all but the last update of any changed data during any interval, which would substantially reduce the volume of data associated with that backup.
Experience shows that computer systems overwrite the same data repeatedly, and thus the technique of updating at intervals rather than continuously considerably reduces the amount of transmitted backup data. Tests conducted by Vinca Corp. show that this technique reduces data quantities by at least an order of magnitude. Such a reduction in data traffic minimizes electronic communication bandwidth and reduces storage requirements at the offsite data vault location. However, the approach also complicates the backup procedure.
- First, the interval backup updates are only usable as a whole. Partially received updates are unusable because they lack chronological consistency.
- Second, the capture of an interval update at a "known good," or stable, data state is an absolute requirement since the chronological inconsistency of the data captured at "unknown states" prevents regression to the last coherent or "known good" state.
- And, lastly, the capture of all changed data associated with an interval must take place almost instantly in order not to interfere with ongoing processing.
In practice, it is not possible to instantly transfer a large amount of data. Instead, the data is tagged and an intercept is set up so that subsequent operations that involve this data are preceded by protective measures, which preserve the older data needed for the backup.
In other words, the transfer of an image of "known good" data is marked as a "work in progress" at the offsite data vaulting location until the entire transfer is complete and the files are marked as such. At that time, the offsite data center can be sure that the transfer is complete and can save the transfer as a "known good" state.
To implement this solution, the software must capture "known good" states and hold these states as "virtual snapshots" of the data images on the primary data set. The snapshots are the checkpoints of the data set and represent the primary data set at "known good" times.
Software determines the actual data set changes that have taken place since a snapshot was taken and maintains these changes as "delta areas" from the "known good" states or snapshots. When subsequent snapshots are taken and compared to earlier snapshots, the deltas between the snapshot intervals are compared and new delta areas are determined. Only the delta areas need to be electronically transmitted from the primary data set to the offsite data vault. The software marks the progress of the data being transmitted so that the receiving data set knows when a particular delta transmittal is complete. Once the entire transmission is complete, the receiving (offsite) data center can commit the entire backup step.
With such a system, businesses rely on keeping "known good" data sets at offsite data facilities and on being able to rapidly recover any lost or damaged data in a disaster. To restore their operating data in a timely manner, businesses must implement automatic systems, which allow easy access to data and enable data to be automatically verified for completeness and integrity. Without this access, most businesses cannot operate.
Remote Backup--The Wrong Way
A large financial institution in New York performs a tedious and difficult process to ensure reliable offsite duplication of its critical data. To satisfy both internal and external regulatory measures, data must be kept in a secure location separate from the company`s operating location. The data must also be checked to ensure that no data is either corrupted or lost and that all transactions are intact and complete. The company`s complete data set must be able to be rebuilt from the data that is stored at the offsite location.
To accomplish this, a tape backup of each server is performed each night when network traffic is is at its lightest and when the majority of network clients are not using the system.
When the tape backups are finished (which usually takes six to eight hours to complete), the tapes are sent via armored car to a ferry that transports the tapes to an offsite facility on Staton Island. When the tapes arrive, they are immediately taken to the offsite data center and loaded into tape devices that can read the data. The data is transferred from tape to hard disk in servers. The process of transferring the data from tape to disk usually takes 4 to10 hours.
After the data is transferred, it is checked to ensure it is complete--a process that entails simulating the database running at the company`s headquarters and accessing all records that were updated the previous day.
The entire process of transferring the data from the company`s operating center to the offsite data center and then verifying the data to ensure complete data integrity takes two to four days. During this time, the company is especially vulnerable since the data has neither been restored to backup servers nor verified to be intact or complete. The goal of this company and other organizations that need to preserve their data offsite is to minimize the data transfer time and to verify the data integrity.
Electronic data vaulting captures changed data at intervals and backs up the data to a remote site.
Richard Ohran is chief technical officer and Wally Marsden is director of product development, both at Vinca Corp. in Orem, UT.