After a slew of acquisitions, the continuous data-protection (CDP) market is maturing, and the debate between "true CDP" and "near CDP" is no longer an issue for most IT organizations.
By Michele Hope
A few years back, a host of hungry and out-spoken start-ups defined the emerging market for the newest data-protection panacea: continuous data protection (CDP). Today, in the wake of a number of acquisitions by the likes of EMC, CA, Symantec, and other large players, the landscape for the now maturing world of CDP looks decidedly different. The players may have changed, but the original value premise behind CDP remains the same.
In an IT world that became largely resigned to going the route of either expensive high-end replication or slower recovery from inexpensive backup tapes, CDP offered a renewed volley in the push for disk-to-disk backup: the equivalent of what some analysts have referred to as "snapshots on steroids."
With CDP, companies now had a new option for applications that couldn't withstand the loss of even a few minutes' worth of data: the ability to automatically record every change made to files, blocks, or specific applications. Storing recorded changes on a separate storage system from that being protected, most CDP products now offer users the option to roll back data sets to any point in time (APIT) as well as to a prior, application-specific marker on the timeline.
Integration with applications such as Microsoft Exchange, SQL Server, Oracle, SAP, and others has become an increasingly important distinction for CDP products. Such integration and application-intelligent markers assist in the recovery of an application from a "transaction-consistent" state versus just a "crash-consistent" state, the latter of which might require more manual effort and time from which to fully restore a specific application.
Enterprise Strategy Group (ESG) analyst Lauren Whitehouse explains. "If an application or database is interrupted before all transactions have successfully been written to disk, then recovery of data from this crash-consistent image will take longer. When all transactions are written to disk before shutting down, recovery is from transaction-consistent images," she says. "Block-based CDP solutions typically produce crash-consistent images. To speed recovery, these solutions can mark consistency points for applications at various points during the day; however, the process of check-pointing is disruptive to normal processes as transactions are held during the process."
In contrast, says Whitehouse, application-based CDP technologies solve this problem by mapping their block structure to a transaction, thereby creating consistency points without having to interrupt the application. InMage president and CEO John Ferraro sees the application-focused CDP functionality in the company's DR-Scout disaster-recovery solution as "point-in-time" data recovery that is both business event-aware and application-aware.
Citing the ability for customers to "bookmark" specific events themselves on the recovery timeline, as well as the ability to recover down to the second for key events pertaining to SQL Server, Exchange, Oracle, SharePoint, SAP, and others, Ferraro offers several examples to illustrate the power such application awareness can bring to users. For instance, users can recover to a point-in-time right before a disaster or when a design milestone is achieved. They can roll back and recover to exactly when a million-dollar transaction occurred or to coincide with end-of-month account reconciliation.
FalconStor Software's Chris Poelker, VP of enterprise solutions, advises users to look carefully at how the CDP technology applies such application-specific markers. "You can do markers from the CDP side that either happen automatically or from an agent working with the application," he says. Poelker cautions about the complexity of enforcing such markers, however, when faced with a three-tier SAP application, for example, that may cross multiple storage arrays, platforms, and servers.
Poelker says FalconStor's IPStor implementation of CDP offers the technology as one of a set of data-protection services that also include encryption, replication, and data de-duplication that are available from within the storage network fabric via one or more unified software/hardware Platform for Optimized Data Services (PODS). Poelker maintains that more complex three-tier application environments can benefit from IPStor's ability to perform the actual function call at the fabric level where the servers are forced to flush their systems down to disk. He also postulates that such complex environments may lend themselves to more of a snapshot approach that preserves consistency groups of data, but that CDP remains valuable for offering the smallest possible recovery point objective.
Near, true, or snapshot?
At the start of the CDP debate, vendors used to fall clearly into one camp or the other: those offering "true" CDP, which most closely matched the definition of CDP as originally outlined by the Storage Networking Industry Association (SNIA), and those offering what many analysts referred to at the time as "near-CDP"—often solutions that offered log-tracking or highly granular time-based snapshot functionality but which might not always be able to roll back to any point in time. Over time, such distinctions have become less of a religious war, according to Rick Walsworth, director of product marketing at EMC, which recently announced Release 3.0 of its RecoverPoint appliance that incorporates much of the CDP technology from EMC's acquisition of Kashya. "At the end of the day, it's about levels of granularity. How granular do you need to be with your recovery, and how efficient is the system at achieving it?" says Walsworth.
While some analysts, vendors, and end users might beg to differ, Taneja Group analyst Eric Burgener says the debate between true and near CDP now amounts to, basically, yesterday's news. "The distinction between true and near CDP has become irrelevant in the last few years," says Burgener, adding that most end users seem more interested now in gaining transaction-consistency rather than in capturing absolutely any and every point in time.
That may be true, but there was more than one CDP user we came across who wouldn't think of parting with its APIT functionality. Contrasting the merits of using CDP technology versus less granular point-in-time snapshots, CIO Matt Reynolds is quick to claim that taking disk-based snapshots is simply not good enough for the needs of the three most critical applications at his San Francisco law firm, Howard Rice Nemerovski Canady Falk & Rabkin. Reynolds uses InMage's DR-Scout to protect what he identified as the firm's three "level-one" systems, which need to be accessible within 15 minutes of any disaster: the company's Exchange e-mail system, the lawyers' work product systems for research and drafting documents, and the firm's Adorant CMS time and billing system. Wanting real-time replication combined with the ability to roll back to any point, Reynolds also noted the need for a solution that would require minimal manual effort to either fail over or fail back in the wake of disaster.
"For our needs, snapshots are not good enough," says Reynolds. "It could be as simple as a document in our system, the final deal on an SEC trade, or an e-mail that had to be sent before midnight. These are critical to capture and restore." With snapshots, depending on the time of disaster and when the last snapshot was taken, these items might slip through the cracks.
Enterprise Management Associates'senior analyst Mike Karp offers some insight to help users differentiate CDP technology from their snapshot counterparts (see "What CDP isn't," above). Among the distinctions was his view that CDP is more of an event-based approach whereas snapshots are more time- (or schedule-) based. Enterprise Management Associates also offers a free IT Solutions Center on CDP with analyst input related to many of the CDP offerings on the market.
With or without replication
Many vendors that succeeded early on with CDP also tied CDP functionality to the ability to replicate CDP's recorded writes to a remote location. A long-time vulnerability of traditional remote replication is also its strength: the ability to replicate everything— even corruption or viruses—to the remote locale. In contrast, CDP's rollback capability, when replicated remotely, ensures rollback to the latest "good" data point prior to corruption or disaster. "Integration with replication provides both operational recovery [local CDP] and disaster recovery [remote replica]," explains ESG's Whitehouse.
XOSoft, with its early WANSync and WANSyncHA products, was one example of a vendor that offered the ability to automatically fail over to the remote site, and then fail back once the primary site was back online. (In the wake of XOSoft's acquisition by CA, WANSync technology has since become integrated into CA's Recovery Management suite, which includes CA's ARCserve Backup r12 software. WANSync solutions were also recently renamed CA XOSoft Replication and CA XOSoft High Availability.)
ESG's Whitehouse says that vendors such as SteelEye and NeverFail are other examples of vendors that combine CDP plus replication and high availability. These solutions maintain something akin to a heartbeat between the primary and secondary site, automatically failing over as well as failing back. Teneros also offers a dual-site heartbeat and auto-fail-over functionality, according to Whitehouse.
As it turns out, simplicity and automation in the fail-over and fail-back process are an increasingly important part of some users' CDP implementations. One who knows only too well the importance of automation in disaster-recovery planning is Peter Haas, IT director at the New Orleans-based Louisiana Supreme Court. In the wake of Hurricane Katrina, Haas found himself physically loading the court's critical servers into the back of a van for transport to the court's makeshift headquarters up north. Noting the critical role Louisiana's Supreme Court judges and staff served in the crisis to help lower courts maintain some semblance of order yet adjust to their new realities, Haas and his team had to scramble to get critical e-mail and court document systems up and running in 96 hours.
Not an experience he wanted to repeat, Haas began looking at ways to replicate the court's data in real-time to another locale. After putting multiple solutions through the court's test environment, he decided on CA's XOSoft solution. Even though all solutions he tested performed as advertised, his ultimate decision was simple. "XOSoft was loaded up and we had an image of our production e-mail server in our test environment. That [image] crashed that night, and this thing failed over itself. We didn't have to do anything. That was a big selling point," says Haas.
Failback was equally smooth. "After Katrina, we learned that recovery actually has a different meaning. There's recovery, then there's return. Recovery is how quickly we can get the system back up if we lose it. Then, when we come back, it's how quickly we can get back to the production systems back home," says Haas.
Sitting on reports and memoranda that are often the result of weeks or months of legal research, Haas realized the loss of even 15 minutes of data would be unacceptable. Today, he credits the CA XOSoft solution with being the backbone of the court's disaster-recovery processes. He even uses it to help perform planned server maintenance, knowing the fail-over and fail-back process amounts to just a few clicks and address changes to keep users in operation.
CDP into data protection
To further blur the lines of today's CDP technology, many software application vendors now offer CDP as part of a suite of unified data-protection options where users can find CDP's APIT functionality for applications with the strictest recovery point objectives, along with pre-scheduled snapshot capabilities for applications that can handle a longer recovery point.
Such a unified data-protection suite—with the ability to achieve various backup-and-recovery service levels "under the covers"—seems a common trend. Taneja Group's Burgener sees this as a natural progression of CDP. "Where it's headed is that it will be an infrastructure offering. Pretty soon, people won't be mentioning it. There won't be CDP products out there anymore. Your choice will be on the network, in an array, or in the backup software," says Burgener.
Today, many vendors are integrating a range of recovery technologies under one unified management umbrella: For example, CA's XOSoft modules tie into ARCserve, EMC's RecoverPoint ties into NetWorker and Replication Manager, and CommVault has its own CDP modules. Even Symantec, which has long had CDP-like functionality with its Continuous Protection Server integrated with Backup Exec, is planning to introduce CDP integration and support in the coming months based on a phased rollout tied to the next release of NetBackup. This will incorporate much of the fundamental CDP technology Symantec acquired when it bought Revivio over a year ago.
Companies such as FalconStor and InMage also continue to expand their breadth of functionality. InMage's Ferraro claims DR-Scout competes with both CDP implementations and the more tried-and-true backup software providers, given its ability to fill the bill as a disaster-recovery solution. "We sell disaster recovery. CDP is a technology," Ferraro says. "Our strategy is to provide a solution that can be extended as customers move more and more to disk-to-disk backup platforms. Because we do local and remote replication simultaneously, we're positioned to replace legacy backup systems."
Whether one vendor can ultimately suit everyone's needs for local and remote disk-based data protection, however, remains to be seen. Much of it may come down to a few age-old equalizers in IT spending: pricing and support. CDP users we talked with spent as much time talking about how well the company supported them and their product as they did the sticker price and various licensing options they examined.
CDP licensing costs
Licensing and pricing for CDP technology still vary significantly. Some vendors offer base application licenses that include CDP at no extra charge, but may also require customers to purchase individual licenses per agent or server being protected. Application-specific integration may also cost extra on a per-agent basis. EMC's Walsworth says RecoverPoint's pricing is based on a capacity-based software model, where RecoverPoint is delivered as an appliance or hardware component with its own base license. On top of that, the solution is licensed based on the amount of replicated capacity.
Given the variety of options in the market for CDP—from software-only implementations to hardware, appliances, or other network fabric-based approaches—Walsworth admits that host-based software with CDP can be less expensive, especially for environments needing to protect a handful of servers. For environment with 50 to 100 or more servers, however, the cost of an appliance or network-based solution starts to look like a better bet, he says. Still, at a price tag that starts at $60,000 to $80,000, he's quick to admit that smaller businesses might well find the cost prohibitive.
So, too, apparently can smaller city governments. Derek Kruger is the IT and communications supervisor for the City of Safford, just outside of Phoenix, AZ. After experiencing a multi-day restore of the city's Microsoft Exchange systems and struggling for some time under a less-than-comprehensive backup process, Kruger decided to start investigating CDP technology. He began with IBM Tivoli solutions and liked what he saw, but the starting price was more than he could afford. As he delved into licensing for other products, he noticed that while the base product might not cost so much, the add-ons began to add up. "A lot of vendors offered an a la carte licensing policy.You could buy the backup product but you also had to buy the Exchange plug-ins and the SQL plug-ins. So, you could start out inexpensively but by the time you add all the agents, you're spending a pretty penny," says Kruger.
He eventually opted for Asempra's Business Continuity Server, in part because of a licensing scheme that is based on capacity being protected, not additional agents. Noting the solution fell "right in the middle" when it came to price, Kruger is happy with the investment. Since implementing the solution, he now has four servers being backed up via CDP, including a file server, Exchange server, and two Microsoft SQL Server systems.
Kruger's found several occasions to roll back—more than he expected. Today, restoring the Exchange Server also takes literally less than a minute, he says. After editing a file all morning, one employee asked if she could see the original document, which Kruger was able to restore back to the time she started work that morning. The administrator of the city's GIS database also needed the system restored several times after changes caused corruption. "If the problem happens at 9 o'clock, I just go back and grab the data from 8:55 PM," he says. Kruger was even able to use Asempra's BCS to restore a good version of a now-corrupt database to his test machine, then walk through the original steps with the user to pinpoint exactly how the corruption occurred.
CDP adoption rises
Recent surveys by TheInfoPro and the Enterprise Strategy Group still point to a relatively small subset of users who currently deploy CDP (see figures on p. 22) in both the enterprise and mid-market space. CDP tends to be more popular among larger enterprises. But, many companies also have plans to adopt CDP in some form within the next 12 to 24 months. Comments from respondents of TheInfoPro's Wave 10 storage survey who had not yet made the leap to CDP indicate some reticence on the part of a few users about jumping into CDP until they see the various vendors' product suites mature more than they have thus far. Still, about half the respondents surveyed in both reports have no plans to adopt CDP.
Whether CDP can ultimately supplant traditional backup/recovery software methods also remains a question still up for discussion. "That's the $64,000 question," says EMC's Walsworth, who still notes there will always be a need for long-term archiving, despite CDP's ability to roll back over a period of three, five, or seven days.
One impediment EMA's Karp sees to CDP adoption remains the fact that it requires a whole new way of looking at data protection and disaster recovery. "A lot of IT managers aren't comfortable with it and still believe that, if it 'ain't' broke, don't fix it," he says.
For those who want to take the plunge, however, Karp offers guidelines regarding which applications might be best-suited for pairing with CDP (see "Which applications benefit most from CDP?" p. 22). These include write-intensive applications, those with high transaction processing needs, and applications such as e-mail and databases with very aggressive RTO and RPO requirements. Karp's views are reflected by other analysts and vendors, although many also believe CDP will ultimately make its way into less critical systems due to its ease of use and simplicity of recovery.
Michele Hope is a freelance writer specializing in enterprise storage issues. She can be reached at email@example.com.