Making sense of CDP

Does CDP stand for continuous data protection, or confusion delays proliferation?

By Michele Hope

The core premise behind continuous data protection (CDP) is relatively simple: Users run a CDP solution non-disruptively in the background and move onto other IT tasks, all while the solution captures and records each block-level or file-level write operation-in real-time-onto a secondary CDP repository. Then, when users need to restore an application, volume, logical unit number (LUN), or file, the technology offers the chance to rapidly “rewind” or “roll back” to any earlier point in time, even up to the minute or second before the system disruption occurred.

Simple, right? It should be. But the concept continues to remain blurry for some IT users that are unsure of exactly what CDP is, and isn’t. This confusion has been largely exacerbated by the wide range of vendors touting CDP and “near-CDP” capabilities.

In a recent survey of storage professionals conducted by TheInfoPro research firm, plans to adopt CDP technology were relatively high among Fortune 1000 and medium-sized enterprises (see figure). Yet comments from study participants demonstrate the fuzziness of their understanding about what CDP is.

Click here to enlarge image

When asked about perceptions of CDP and adoption plans, one survey participant responded by saying, “This is snap-shotting, right?” Another said, “I consider CDP to be cloning and synchronous mirroring.” TheInfoPro report concluded that “it appears that some storage professionals are blurring the concepts of CDP, snapshots, archiving, and thin provisioning.”

Similarly, an InfoStor QuickVote reader poll illustrates both the interest and ambivalence currently surrounding CDP technology. For example, 15% of the respondents already use some type of CDP and another 19% plan to deploy it later this year. Yet despite this interest, 37% of the users have no plans for CDP, and 11% don’t even know what it is (see figure, below).

Click here to enlarge image

“CDP is a transformational technology,” says Arun Taneja, founder and consulting analyst with the Taneja Group, “and I don’t think users fully understand the ramifications of CDP and what it’s going to do over time to various other technologies they’re using today.”

Despite claims about CDP’s ultimate reach in the enterprise, some IT organizations are still unsure about how and where the technology is best incorporated.

Further confusing the issue, pricing for CDP and near-CDP products varies widely, from $995 per server (or $35 per desktop) for products such as IBM/Tivoli’s CDP for Files to somewhere between $150,000 and $200,000 for products such as Revivio’s CPS1200 CDP appliance. (Revivio also has an entry-level appliance priced below $50,000.) Revivio’s high-end CDP solutions time-stamp and record each block-level write operation performed by large enterprise-class applications, such as a 20TB Oracle system that may span multiple servers and operating systems.

Despite the high price tag, CDP systems may be a less-risky maneuver for protecting critical systems than the alternative-the high cost of downtime and business disruption that might result. Beyond just providing a copy of the data, Revivio’s CDP systems offer a “cross-server, cross-application, cross-storage group view of your data at exactly the same moment in time across all those entities. It’s basically your entire operating environment exactly as it existed at one point in time,” says Kirby Wadsworth, Revivio’s VP of marketing and business development.

The confusion surrounding what CDP is or isn’t is due in part to vendors, many of which have come up with their own variations on the original theme. According to Enterprise Strategy Group (ESG) analyst Heidi Biggar, vendors that remain true to the original CDP premise of allowing restores from any recovery point include Asempra, Revivio, Mendocino (which OEMs its CDP engine to both EMC and Hewlett-Packard), Storactive (now part of Atempo), TimeSpring, and XOSoft. More recent entrants to the “true-CDP” arena also include FalconStor Software, IBM/Tivoli, InMage, and Quantum.

But variations on the CDP theme can be readily found in the market as well. These products use everything from sophisticated application journal-logging or log-shipping functionality to frequent disk-based snapshots of data. While not guaranteeing the ability to recover data from necessarily any point in time (such as the millisecond before a virus hit the system), many of these products still offer disk-based recovery of data up to the last minute or two, or the last hour, before a significant corruption event occurred.

Examples of these types of “near-CDP” solutions include Mimosa Systems’ NearPoint for Microsoft Exchange Server, Symantec Backup Exec 10d for Windows Server with Continuous Protection Server, and Microsoft’s Data Protection Manager (DPM). “Put simply, CDP is transaction granular and near-CDP is snapshot granular,” says ESG’s Biggar.

Other CDP variations may follow the original “any-point-in-time” recovery premise, but embed CDP functionality as one component within a broader remote replication or mirroring solution. Examples include Availl’s WAFS, Kashya’s KBX5000, and XOSoft’s WANSyncHA.

The line in the sand between what constitutes “true-CDP” and “near-CDP” functionality comes from a definition of CDP prepared by the Storage Networking Industry Association’s (SNIA) Data Management Forum CDP special interest group.

In the first iteration of its CDP Buyer’s Guide in July 2005, the SNIA CDP SIG defined CDP as “a methodology that continuously captures or tracks data modifications and stores changes independent of the primary data, enabling recovery from any point in the past. CDP systems may be block-, file-, or application-based and can provide fine granularities of restorable objects to infinitely variable recovery points.”

The SNIA Buyer’s Guide further identifies three characteristics that a true CDP solution should have: Data changes are continuously captured or tracked; all data changes are stored in a separate location from the primary storage; and recovery point objectives (RPOs) are arbitrary and need not be defined in advance.

By this definition, disk-based snapshot technologies, including those fueled by Microsoft’s Volume Shadow Copy Service-however granular-would not qualify as true CDP since RPOs are typically pre-determined and do not allow recovery of data from any point in the past.

Even among true CDP candidates, there are differences in how and where the technology is implemented. Some come as out-of-band appliances; others are host-based. Some require agents on the client or host; others do not. Some even come as part of a larger backup application package. Eventually, some CDP applications may end up as network-level services offered through some type of switch.

According to George Symons, CTO of EMC’s information management software group, EMC’s RecoverPoint CDP product (which is based on technology from Mendocino) will likely be available as a packaged appliance that includes its current Replication Manager CDP management interface in conjunction with a Clariion Serial ATA disk platform. Further down the road, Symons says the solution will be more fully integrated with Legato NetWorker as one of many backup-and-recovery options the software will track and manage.

Deeper integration of CDP functionality at the application level is something many vendors are pursuing and either currently offer or plan to offer in the near future. The primary block-level CDP providers-such as Mendocino (and OEMs EMC and HP), Kashya, and Revivio-have increasingly begun to focus on doing more than capturing data at any point in time (APIT). Many of these vendors are now focused on marking application-significant events as part of the metadata they collect on their data capture timelines (see figure, below).

EMC’s RecoverPoint is a block-based CDP solution that allows administrators to choose from virtually any recovery point up to the instant prior to corruption, or from a recovery point marked by an application-related event.
Click here to enlarge image

The reasoning for this added layer of CDP intelligence goes something like this: Let’s say a database application crashes at 4:55 PM and was in the middle of a number of transactions when the crash occurred. Although the CDP solution allows administrators to restore data from 4:54:59, would they really want to do that when the database may have been in such an inconsistent state? What if, instead, the administrator rolled back the CDP system to an earlier time, say 4:45, to select a data capture point when the application completed its last database checkpoint?

Responding to what users would like to see in these products, many CDP vendors have begun to capture or append more application contextual details within the CDP repository to minimize recovery steps even further.

Rick Walsworth, Kashya’s VP of marketing, describes a few of the other Oracle-related application I/O patterns the Kashya system marks as events in its own CDP repository. These include checkpoints the database system performs, when a transaction has been committed and the I/O is complete, when an application has been quiesced or put into hot backup or standby mode, etc.

When asked how these types of application integration efforts will ultimately play out, Taneja sees a future for CDP that includes embedding the functionality into critical production applications themselves. He sees CDP eventually becoming another example of the move toward more-autonomous, self-healing applications.

“The application will have direct hooks all the way through the database into the storage system,” says Taneja. “When certain things go out of boundary or when there’s a complete disruption, the application will know what to trigger, including triggering mechanisms that bring up the storage volume as of a particular point in time. Storage processes that happen manually now will then happen automatically.”

Not all vendors and analysts share this vision. Instead, they may prefer to see CDP making a more horizontal impact across the organization, as part of a separate backup-type application suite that covers all of the company’s critical assets, regardless of the application in use. Still others discuss the prospect of using the CDP repository as a backup engine from which many other uses flow. These include the addition of policy-based classification and migration functionality that would allow data time-stamped in the repository to be classified and moved to secondary or archival storage devices, as part of a larger initiative toward information lifecycle management (ILM).

Revivio’s Wadsworth sees it this way: “CDP basically becomes an entry point for all secondary storage. All creation occurs through the CDP engine. You can do interesting things with the data later that can only occur offline. …It’s a powerful entry point into ILM or recovery lifecycle management, storing the ability to get at information. Basically, we see it as a short-term caching system that allows you to access copies of data and perform all kinds of intelligent operations against it without impacting production.”

Stephen Terlizzi, VP of global marketing at Atempo, also sees the LiveServ CDP technology that Atempo recently acquired from Storactive as a foundation for achieving the company’s plans to offer a “trusted ILM” platform of products.

EMC’s Symons sees the use of CDP technology going much farther than mere backup and recovery as well. “With Replication Manager, the primary-use cases are not for backup and recovery. Most of our customers actually use it for repurposing to create a test environment, data mining, and data analysis, or in some cases sending the snapshot to a backup process,” says Symons.

Regardless of what the future holds for CDP, there’s no doubt IT organizations are seeing benefits from its relatively hassle-free operation, compared to the tedious and time-consuming practice of scheduling and monitoring backup jobs.

Chris Stakutis, CTO, IBM Tivoli CDP for Files, states it this way: “Backup used to be a discrete operation. You used to have to run backups. In contrast, CDP is hands-off, transparent, and continuous. File-level CDP products lead to labor-savings benefits.” CDP for Files uses a client-only architecture that allows changes to be recorded on a local machine first, whether a server or separately attached device has been configured to receive changes later, when the user’s machine is connected.

Regardless of the implementation or CDP definition each vendor supports, FalconStor VP of marketing Don Mead brings the issue back into perspective for users: “It’s about having a data-protection solution for all your storage area needs, from mobile users with laptops to the data center with disaster recovery. Don’t be so focused on the technology. Just look at what your needs really are.” He says that a careful analysis of your RPO and RTO should ultimately lead to the right product selection.

Michele Hope is a freelance writer. She can be reached at mhope@thestoragewriter.com.

CDP case studies

Keith Richardt, owner of IT services provider K-Star, knows what it’s like to worry about whether customer data is properly protected. When he heard about TimeSpring’s TimeData CDP software, he knew immediately it could help remove his concerns about how well he was protecting about 600GB of client data housed at his Atlanta collocation facility.

“For the past five years, we’d gone through the nightmare of inconsistent backups and data-recovery needs that weren’t covered,” says Richardt.

Backing up that much data in a five-hour nightly backup window was not going to happen, says Richardt. That’s why he got excited when he first heard about how TimeData could sit in the background and collect ongoing changes made to both NTFS files and SQL Server database applications. “It was an easy setup,” says Richardt. “You install TimeData on the server you’re protecting and then set it up on another box called a repository, which collects the I/O calls or delta file changes as they are transpiring on your server.” Richardt also has plans to implement TimeData for Exchange.

Some other CDP users came to the table with a healthy dose of skepticism when it came to believing the products could actually deliver on their promises to eliminate the backup window, restore from any point in time, and do it all without the need to halt applications. One of these users was Manny Singh, director of IT at Prairie Packaging, which relies on keeping data available for its mission-critical supply chain management applications and Oracle-based ERP system. Having already experienced storage and management challenges trying to bend snapshot technology to his company’s need for virtually instant restore points, Singh was open to giving Revivio’s CPS1200 appliance a test run.

“When they first told us about how it could back up our system without having to quiesce the database, we thought either they’re crazy or they’re geniuses,” says Singh, noting that keeping the systems live while backups were performed was a key criteria for whatever solution they chose. “We were so pessimistic when we wrote the PO that we had a clause in there for them to take it back and refund our money if it didn’t work.” After running the system through its paces, however, Singh said his team became sold on the concept of CDP.

Another early CDP adopter was David Kronick, CIO of CD&L, a transportation and logistics company that services the New York and New Jersey area. CD&L implemented Asempra’s CDP-based Business Continuity Server after a six-month pilot phase. “What intrigued me was that they seemed to have an original approach in that it wasn’t full-file protection, but down to the bit level and essentially real-time,” says Kronick. “It was kind of like TIVO for data recovery. You can actually take the slider and roll it back 10 minutes, then start using that file immediately.”

“Our courier and trucking company is essentially a 24x7 shop with a lot of transactions that happen very quickly,” says Kronick. “An order comes in and it needs to be processed, picked up, and delivered-in some cases within 90 minutes. In New York City alone, there are 4,000 to 5,000 of these a day. It seemed like a really good idea to be able to roll back to a couple of minutes prior to a file getting corrupted or deleted.”

Ben Weinberger is the IT director for the Florida-based law firm, Ruden McCloskey. With hurricane season a reality that could not be avoided for the company’s 10 offices spread throughout the state, Weinberger knew the firm needed a way to gain consistent, real-time access to data outside of Florida. After reviewing a number of replication products, he chose XOSoft’s WANSyncHA to help the firm perform real-time replication to a hot data center in Chicago. According to Weinberger, XOSoft’s Rewinder CDP feature made the product stand out from competitors’. “It seemed like a natural add-on to replication. When you’re replicating in real-time, what happens if you have a corruption on one side or someone deletes a file? You’ve just replicated the corruption from one server to another,” adds Weinberger. “XOSoft’s Rewinder allows you to back out of that and fix the problems. You don’t want to take snapshots all day long to protect yourself from that.”

For some users, CDP provides that extra cushion of insurance in case the unexpected happens. For example, Jason Hamlett, IT manager for Fulcrum Pharma, implemented an Availl WAFS solution to allow all of the company’s worldwide offices to share files and allow the files to be updated in real-time. The CDP functionality that came with the solution also allowed Fulcrum Pharma to consolidate remote office backups to a central location. “CDP for us really comes into its own with our small branch offices like the one in Tokyo. These offices don’t really need access to our WAFS network, but I still need to make sure their data is being backed up,” says Hamlett.

“I could write procedures and policies for the staff to write and retire tapes, but there’s always an element of error involved in that,” Hamlett explains. “CDP lets me consolidate all my backups to a central location. Any changes to data are backed up to the CDP vault, and then it’s backed up to the central location. The Availl software sends only the changes made to previous files.”

While Hamlett hasn’t yet had to fall back on CDP, he prefers to have it just in case. “I’m an IT manager who likes to have more than one copy of everything. Before, a number of users would come to me [so I could help] restore a file they deleted an hour ago, and if you only backed up at 10 PM that evening you don’t have access to that file. Now, with our CDP product, we do.”

This article was originally published on April 01, 2006