Continuous data protection: Is it about time?

With the emphasis rapidly shifting from backup to restore, CDP-whether “periodic” or “constant”-can meet your RTO and RPO goals.

By Mark Ferelli and David Freund

The IT marketplace is awash with buzzwords, acronyms, and rallying cries, with storage vendors playing dual roles as both players and cheerleaders. Continuous data protection (CDP) is one the more recent buzzwords-or “buzzphrases.”

To define and understand CDP, one adjustment to the IT mindset is in order. CDP embraces the backup, restore, and archival functions within a data center. Although the fundamental reason for backup-protection against data loss-seems intuitive, that reason no longer dominates. That mindset has always been about a worst-case scenario of catastrophic failures where the method of protection has been incremental and periodic backups.

No more. The need for backup is not only for protection against disasters and other failures, but it is also for active data availability and rapid restores. Backup is insurance, but now the emphasis is on restore. Archives are no longer passive vaults of information, but are actively accessed for a variety of business and legal purposes.

One caveat: Many vendors, both large and small, want to put their products under the CDP umbrella. Continuous protection, after all, sounds very appealing. So it’s essential to scale the hype heap and make selections based on business-oriented criteria for performance, cost, ROI, and TCO.

Desperately seeking definition

CDP uses disk technology to continuously capture updates to data in real-time or near real-time. The primary result is that backup windows become irrelevant, because backup is occurring all the time. The secondary result is that files are available at disk speeds.

Traditional backup typically captures data changes made over a day or more. CDP promises access to point-in-time copies of data for essentially any point in time. Reducing recovery time to near zero-also known as “bridging the protection gap”-is the real driver for adopting CDP. As with any new technology, it is important to understand what CDP does, how it works, and where it fits within an overall data management and protection strategy.

At this early stage, it would be unwise to treat continuous backup as a replacement for traditional backup. Traditional backup applications move, manage, and catalog data of all shapes and sizes from a large number of sources. These established products are used to maintain and store data for months and years at a time. CDP products will eventually improve the backup process by reducing dependence on nightly backups.

With full-scale CDP, there are no backup schedules. When data is written to disk, it is also recorded in a second location, usually via another computer or appliance over the network. This introduces some overhead to disk-write operations, but eliminates the need for nightly scheduled backups. It also makes it possible to recover an up-to-the-minute copy of a file or record.

CDP alone doesn’t address disaster recovery, meaning it does not necessarily provide protection in the event of the loss of an entire site. To address this need, CDP can be supplemented either with remote replication or traditional backup with off-site tape protection.

Bear in mind also that CDP isn’t RAID, replication, or mirroring. Each of these technologies has its place in the data life cycle, but copy-based technologies only allow you to recover the most recent copy of data. CDP allows you to restore previous versions as well.

Classifying CDP

Definitions of CDP vary from vendor to vendor, but most would agree that time-stamping and the ability to roll back to any point in time are CDP’s distinguishing attributes. Whereas snapshots can capture data at set intervals, CDP software can record every change as it occurs, employing time-stamps to enable users to restore from given points in time.

To draw out the important distinctions, then, we must create two classes of CDP products: periodic and constant CDP. The distinction relates to the granularity of the time-stamp.

Periodic CDP products are more or less locked into a coarse granularity. Backup periods are at pre-defined, fixed periods: 10 minutes, 1 hour, 24 hours, etc. This approach is already in use with existing backup and snapshot products, a more recent example being Veritas’ “Panther.”

In June, Veritas released a beta of Panther, a new add-on to Backup Exec that Veritas calls a CDP product. Panther is essentially a re-packaging of remote file-system replication functionality from Veritas’ Replication Exec, with tighter integration with Backup Exec using Windows 2003 Volume Shadow Copy Service (VSS). A Continuous Protection Agent, installed on each server, continuously transmits protected-file changes as they occur to a Continuous Protection Server. (This is called “dynamic replication” in Replication Exec terminology.) As with Replication Exec, when a file is changed the agent copies only the changed data, not the entire file.

On the Protection Server system, Windows VSS is used to take periodic snapshots of agent-replicated data. It’s these snapshots that are made available to users for restoring and for eventual backup to other Backup Exec-controlled media. Coarse granularity is a valid CDP option and will be sufficient for many users. Panther will be especially useful to companies that require a CDP product that has tight integration with Backup Exec.

Constant CDP

Constant CDP takes the next step. To fully understand the difference, consider the following four metrics:

Recovery time objective (RTO)-How quickly can data be recovered? This includes the time to restore from the backup media plus all additional time required for data integrity validation, system or application preparation, rolling forward data, etc. Operational, accessible copies of data, using technologies such as mirroring (and other forms of RAID) or clustering, for example, provide the fastest-and most expensive-RTO. Magnetic media stored off-site provide the slowest.

Recovery point objective (RPO)-How close to the last possible instant prior to a failure can applications, systems, and/or data be restored? A backup performed nightly would represent an RPO of about 24 hours, because the worst-case scenario would be an outage during the backup.

Recovery object granularity-What size objects need to be recovered? An entire system, disk volume, file, database, etc.?

Recovery time granularity-How finely can one “turn back the clock” to recover an asset? Must it be restored to the way it was at the end of a given business day? At a given hour, minute, or second? Are quantities other than time, such as transactions, needed? A nightly backup, for example, offers 24-hour recovery time granularity.

Constant CDP allows users to recover from any arbitrary point in time, rather than a pre-identified, fixed interval. This is the famous “time-machine” option that many promote as the essential kernel of the technology.

Conceptually, a constant CDP solution comprises a foundational image or mirror of an existing volume, directory, or file that includes a time-stamped log of each subsequent write operation. To drill down a bit, in most solutions the mirror is updated with each write transaction. The log maintains a history of the modified data blocks, not unlike the copy-on-write technique common to snapshots. Many CDP products can present a virtualized volume representing the contents as they existed-I/O by I/O-at any point in time from the beginning of the log.

The effect is to provide much-finer recovery time granularity than with split mirroring, snapshots, or other periodic solutions. It’s possible to view a volume, file, or database as it existed minutes-or even seconds-earlier, and to recover it almost instantly. More important, it enables viewing an object at any arbitrary point in time, as far back as the constant CDP product’s recording goes. This dramatically reduces both RPO and RTO when restoring a complex system to a previous consistent state.

Disk capacity requirements for this approach are typically less than split-mirror, periodic approaches. Each member of a mirrored volume that is “split” out of a mirror set holds an image of the data at the point in time it was removed from the mirror set-and must be a full volume in size.

Click here to enlarge image

Because each point-in-time copy requires a full volume’s worth of space, a protection mechanism that splits a mirror every four hours would require six times the original volume’s capacity to hold just one day’s worth of data.

In contrast, a CDP solution requires a base image plus enough space to maintain the change log, typically making the additional space required a small fraction of the original volume’s capacity for the same 24-hour period (depending on the amount of data being changed). And recovery-time granularity makes a quantum leap from four-hour intervals to any desired point in time! Additionally, adopters of this technology may choose to retain enough log information to enable rollback over greater periods of time, up to several days or even weeks, depending on the rate of change and available capacity.

Wouldn’t this be handy for managing and recovering e-mails in a data center where practically every mail message and its replies are considered business records as defined by various regulatory statutes?

There are numerous players/products in the CDP space, some being software-based. Others offer an appliance to operate with customers’ current storage. We’ll examine one example of each approach.

Revivio’s Continuous Protection System (CPS) is a SAN-based appliance designed to operate in parallel with an application’s primary data storage. Designed to meet enterprise-class requirements, it incorporates high levels of fault tolerance, including redundant, hot-swappable components and fault-tolerant cache. It also doesn’t require software applications, agents, or drivers to be installed on host servers. CPS presents storage volumes (LUNs) to host servers and leverages existing host-based volume management software (such as Veritas’ Volume Manager).

As data is written to protected volumes, the CPS system stores the new contents for each block and keeps a time-stamped record of the old contents in its TimeStore database. CPS can re-create the contents of a disk volume from any previous point in time, either directly on the production environment disks or by presenting a new virtual volume, called a TimeImage, which can be mounted on any server.

Like snapshots, these virtual volumes can be used for backup or archiving data to tape (with no backup window), testing new software on production data without needing to replicate it, and more. Revivio’s vision is to provide CDP for enterprise-class applications running on databases that have significant RPO and RTO restrictions.

An example of a software-based constant CDP solution, as opposed to an appliance, is XOsoft’s Data Rewinder. A “lite” version of the company’s WANsync software, the core of Data Rewinder is a filtering file system layer, dubbed XOFS, which sits directly on top of the standard OS file system in the host it protects. File creations, deletions, writes, and other changes pass through XOFS, which records a “counter-event,” a command that would “undo” the change being made, to a journal that XOsoft recommends be located on separate storage volumes.

If data corruption occurs, or a file needs unearthing for governance or legal reasons, an administrator selects the files or databases to be recovered from a list in the management tool and chooses a point in time to “rewind” the data. The recorded undo operations are executed sequentially to return the data to its exact state at that point in time. Restore time will vary, depending on how far back in time you need to go, but it will be much quicker than restoring from a backup. XOsoft’s Data Rewinder can protect arbitrary files (which can be grouped as desired), and specific applications such as Microsoft Exchange and SQL Server and Oracle’s DBMS, for which it also records known consistency points.

The appeal of eliminating backup windows and providing nearly instant restores is obvious. However, storage managers need to decide where CDP fits within the current IT strategy.

The basic questions to be asked in evaluating CDP (or any other new technology) are the following:

  • What problems can it solve today?
  • What tools or technology does it enhance or replace?
  • What is the value proposition?

Remember that CDP is not yet a replacement for traditional backup. Also bear in mind that CDP alone doesn’t provide disaster recovery; these tools do not provide protection in the event of site loss. To address this need, CDP can be supplemented, either by replication or traditional backup and/or by using a second production site or off-site vault.

At this time, the area that CDP best addresses is the backup and recovery of data associated with a particular set of applications. The characteristics of these applications are the following:

  • They usually run continuously;
  • Their data changes frequently;
  • The data is stored in large containers, making activities like backup difficult, disruptive, and time-consuming; and
  • There is a significant impact to the organization when they go down.

Even the best-planned traditional backup applications tend to handle these problems inadequately. CDP provides many of the same benefits as split mirroring and snapshot technologies. Can it replace these solutions? In many cases the answer is yes, particularly where there is a requirement for improved RPO or recovery-time granularity.

When evaluating a CDP solution versus disk mirroring or basic snapshot products, it’s necessary to consider which features are most important for your needs. Some products, regardless of whether they use split-mirror, snapshot, or CDP technologies, are application-agnostic, while a number of CDP products are application-specific-meaning they have enough knowledge of an application’s internal operation to mark certain points in a journal, or certain periodic snapshots, as points at which the application is guaranteed to be safely recoverable. But such specificity can mean one solution may not be readily applicable-if applicable at all-to other systems needing protection. The same applies to traditional backup products. The real difference CDP affords is extremely fast and fine-grained recovery of applications and data to their prior state at an arbitrary point in time.

Ultimately, the question of value will determine if, and where, a CDP tool makes sense. Its cost and benefits must be evaluated against alternatives such as mirroring, replication, and snapshots.

CDP will continue evolving to provide stronger integration with data-intensive applications. The Storage Networking Industry Association’s Data Management Forum has established a CDP Special Interest Group to explore definitions and identify key issues. But the core issue to be resolved is the “must-versus-maybe” consideration. How much of an organization’s information is so urgent or valuable that it requires CDP?

Meeting the requirements of HIPAA, etc., is important, but these regulations may apply to only a small percentage of an organization’s total data.

But there are reasons beyond regulatory compliance to implement a solution that so radically improves restore time and flexibility. Strategic business decisions may depend on historical information from an active archive. Corporate governance issues might be resolved by a version of a business record that existed for a limited span of time. The ongoing specter of litigation support is a non-trivial incentive for using CDP, especially in the enterprise data center. Another consideration is whether CDP has a place in an IT organization’s ILM strategy.

Mark Ferelli and David Freund are analysts at the Illuminata research and consulting firm (www.illuminata.com) in Nashua, NH.

This article was originally published on October 01, 2005