Window-less backup + replication = the answer?
By Stephen Zalewski
Getting backups done quickly and without application downtime is a big headache for IT administrators—and it is getting bigger. IT managers are looking to the promise of "window-less" backup to lessen the impact of the backup process on business operations.
Although the term "window-less backup" is not new, confusion still surrounds its exact meaning, its implementation, and the technology that is involved.
In reality, a variety of backup implementations (e.g., offline, online, and "fuzzy" backup) have evolved to manage the backup window problem—options that are often coupled with replication.
The problem with some of these options is that they are premised on the tradeoff that to decrease the time to back up data, without actually stopping the application, then you have to gamble on data coherency. A true "window-less backup" implementation does not force administrators to take such risks because data coherency is not jeopardized.
Before implementing a backup architecture, IT administrators must consider all options, carefully weighing each against its risk tolerance and application availability requirements.
YOUR BACKUP OPTIONS
Offline backup. The safest way to back up data is to shut down specific applications for the duration of the backup. Bringing the application down or offline is generally acknowledged as a way of ensuring that application data is in a defined state for the duration of the backup. The length of time that the application is down varies depending on the amount of data that must be copied and the I/O throughput capacity of the backup system.
While this approach is generally the easiest option to implement, it may not be a viable option for many organizations whose profitability depends on constant application availability. A reservation system at a hotel or the operations of an Internet-based business, for example, cannot be shut down for backup.
Even for IT administrators who have the luxury of bringing down an application for a fixed period of time, offline backup can still present a problem when data capacity reaches a certain level or backup demands exceed the bandwidth capabilities of the I/O subsystem.
Online backup. Because data backup often has to be performed while an application is online, most commercial database products have implemented database managers to keep applications up and running during the backup process. Online data products ensure data integrity for "point-in-time" backups, which occur while the application continues to run (see Figure 1).
Figure 1: The two types of replication technology in use today for backups are mirroring and snapshots. Shown here are synchronous and asynchronous mirroring for local and remote RAID arrays.
A point-in-time backup requires the data repository to be placed in a "paused" (or "quiesced" or "synchronized") state. While the repository is quiesced, database managers generally allow read transactions to occur but maintain a journal of write transactions, which are replayed after the backup is complete and the repository is "unsynchronized." This ensures application availability.
On the downside, online backup is only an option for applications that can be quiesced, and those that can may suffer performance degradation during the often-lengthy period it takes for the data to be streamed to the backup media. Also, the database manager must eventually replay all journal entries captured during the online backup before the database can be synchronized back to a real-time state. Until that time, the database is effectively working with a time delay.
"Fuzzy" backup. This approach is suitable for some network-based file systems. "Fuzzy" backup is so named because data is in a somewhat undefined or nebulous state during the actual backup process. Applications do not stop during the backup process and are not synchronized thereafter. Rather, files are left open, which means they can be potentially written to while the backup is occurring.
The resulting backup, therefore, is a point-in-time copy of the file content. A valid copy of the file is not necessarily made since only a very few applications today can build a journal of changes that occur during the backup.
For file servers, fuzzy backup is often sufficient. IT administrators avoid shutting down servers and disrupting the user community. Restore is typically limited to individual files so the likelihood that an individual file is coherent after backup is very high. This approach, however, is generally not well accepted for application servers because of the unlikelihood of restoring entire coherent datasets.
Replication backup. Replication technologies address many of the shortcomings of offline, online, and fuzzy backup implementations and are currently one of the promising developments for ensuring a quick, full, and undisruptive backup.
Used in conjunction with either an online or offline backup solution, replication significantly reduces the window of time that data is unavailable. Without replication it is extremely difficult to achieve true window-less backup.
For offline backup solutions, replication minimizes the time that the application is stopped. And for online backup solutions, it shortens the time that database managers must synchronize or quiesce the database.
Replication backup creates a point-in-time copy of the data that can then be used as the source for the backup. The objective of replication is to create a copy of the data in a way that is not significantly disruptive to the application.
A point-in-time copy can exist as a physical or logical copy of a disk or volume. The distinction is important. With a physical replication, the I/O necessary to populate the replicated volume is done prior to creating the point-in-time copy. For a logical replication, the actual copying of the data to a new volume is created at some point in the future.
The two types of replication technology in use today for backups are mirroring and snapshots.
- Mirroring—Mirroring is the process of copying data continuously and in real-time to create a physical copy of the volume. It does not end unless specifically stopped. Mirroring can be done synchronously and asynchronously. Synchronous mirroring updates source volumes and target volumes simultaneously. Control is passed back to the application when both volumes (or caches) are updated. The result is multiple disks that are exact duplicates or mirrors of each other. This approach is limited in terms of the distance between mirror volumes.
Asynchronous mirroring updates the source volume and target volume(s) serially. Control is passed back to the application when the source or cache is updated. Asynchronous mirrors can be deployed over long distances, commonly via TCP/IP. Because the updates are done serially, the mirror copy is always coherent, but it is also time-delayed.
Mirroring cannot replace backups. Although mirroring provides protection against physical errors (e.g., head crashes and mechanical failures), it does not protect against logical errors (e.g., corrupted data and deleted files). Therefore, it can be used in creating a window-less backup process, but it should not be used as the backup product.
By using mirroring data—either synchronously or asynchronously—with offline or online backups, IT administrators can minimize application downtime. However, there is an associated increase in required storage capacity with mirroring. This should be taken into consideration when evaluating it as an option.
- Snapshots—Snapshot technologies provide logical point-in-time copies of volumes or files. Snapshot-capable volume controllers or file systems configure a new volume but point to the same location as the original. No data is moved and a copy is created within seconds. The point-in-time copy of the data can then be used as the source of a backup to tape or maintained as in its current form as a disk backup.
Figure 2: When a copy of data is requested using the copy-on-write technique, the storage subsystem sets up a snapshot index and represents it as a new copy.
Figure 2 illustrates the snapshot process using the copy-on-write technique. When a copy of data is requested using the copy-on-write technique, the storage subsystem simply sets up a second pointer—a snapshot index—and represents it as a new copy. A physical copy of the original data gets created in the snapshot index only when the data in the base volume is initially updated.
Because window-less backups using snapshots are quick and less resource-intensive to create than mirrors, IT administrators may find that it is possible to make frequent backups and therefore ensure quicker, full restores.
However, snapshots, unlike mirroring, do not protect against physical storage failures. Since multiple logical snapshots can point at the same physical data blocks, if one of the blocks goes bad, multiple snapshots are thereby invalidated. This means that while snapshot technologies, like mirroring, can be used to successfully deploy a window-less backup implementation, they are not a replacement for more-traditional backup processes.
DEPLOYING BACKUP TECHNOLOGIES
When deployed with a suitable backup architecture, replication can reduce the backup window. But how do you implement the technology?
Implementation can be done at three logical levels:
- Host-based—This approach relies on file-system or volume-manager software, installed on one or more host systems, to do the asynchronous mirroring or take snapshots.
A key advantage to this type of approach is that it is the easiest and least costly method to implement because no additional hardware is required. On the downside, this option is potentially less scalable and lower performing than other approaches because its control functions run on the host and require host-processing cycles.
- SAN-based—This approach implements the replication functionality within equipment in the SAN (e.g., appliances, switches, or routers). This solution offers replication through synchronous and asynchronous n-way mirrors and snapshots. It is the most flexible solution and offloads the host from implementing and controlling these functions.
One disadvantage is that it may require the purchase of additional hardware to deploy the SAN appliances.
- Storage controller-based—This approach relies on the storage subsystem to provide synchronous and asynchronous n-way mirrors and snapshot functionality.
There are two advantages to this approach. Because it is implemented within the storage system and tuned to the specifics of that system, it can enable optimal performance. It is also easier to manage because everything is done transparently within the storage system.
Disadvantages include the possibility of having to supplement it with third-party software for added replication functionality. Also, it may not work well in heterogeneous SAN environments and could lead to single-vendor product lock-in.
While replication technology is key to a complete window-less backup solution, the advantages and disadvantages of taking snapshots and mirroring data must be evaluated in much the same way as the various backup architectures it supports—with a careful consideration of the specific needs of the organization.
Today, the best window-less backup implementations employ a backup approach and replication technology that not only balance the length of the backup window with the probability of data recovery, but also factor in the business environment and recovery requirements.
Stephen Zalewski is a senior architect at Fujitsu Softek (www.fujitsusoftek.com) in Sunnyvale, CA.