By Jeff Boles
We've come a long way over the past five years in fundamentally changing the way data is protected. After having put up with unreliable methods of backup to and recovery from tape, we entered the era of disk-based backup. Most mid-size and larger companies today have augmented their data protection environments with disk-based backup, at least for their most mission-critical environments. They are now comfortable with keeping several weeks to several months worth of data on disk (for recovery purposes) and older backups on tape for regulatory or business compliance purposes. RTO and RPO have improved considerably as disk-based technologies are more reliable and give more predictable results than tape.
But most of the technologies absorbed by IT so far are still point in time (PIT)-based. Not that this is surprising: You have to walk before you run. The appeal of virtual tape libraries (VTLs), for instance, is driven by the fact that you can decrease the amount of time it takes to do the backup while at the same time improve backup reliability and recovery. And you can do all this without changing the backup procedures. You can still maintain the weekly full/daily incremental routine and simply gain faster and more reliable backups and recovery. In addition, you can continue doing DR the same way as before.
But while these technologies have the advantage of familiarity, they still don't ultimately address the problem of PIT backup operations that impact production applications and result in data loss on recovery. The good news is there are several vendors that have developed products that address those issues.
PIT backup drawbacks
Historically, data protection has involved periodically making copies of production data and storing them separately from the production servers so that data could be recovered if it became lost, corrupted, or otherwise unavailable. The emergence of 24x7 operations drove backups from an off-line to an on-line operation, but the performance of production applications was impacted, sometimes significantly, during the backup process. All of the evolutions in data protection technology over the past 20 years -- including incremental and differential backups, multiplexing, faster tape technologies, disk-based backup, snapshots, VTLs and data de-duplication -- have improved the data protection process but they didn't change the PIT orientation of backups. PIT backups lead inevitably to data loss on recovery, with the amount of data loss determined by backup frequency.
The four key backup problems have been backup windows, recovery point objective (RPO), recovery time objective (RTO), and recovery reliability. Developments in data protection have lessened the impact of these issues, but they all still exist.
The concept of continuous data protection (CDP) is simple: After establishing a baseline for a data set and storing it on disk, CDP captures every write that the application makes and keeps those along with all the associated metadata (time stamp, volume association, etc.). Given this data in the disk repository, users are able to recreate on demand an image of the volume at any point in time (APIT). Think of CDP as TiVo for your data.
CDP technology transforms the data protection landscape. Since data is continuously being collected, it becomes recoverable as soon as it is created -- not only once a backup has been taken. It also significantly lowers instantaneous resource utilization -- not only on the production servers, but also on networks. Instead of packaging up all changes and trying to push them down a network at the same time, writes flow across the network in real time as they are created.
While the concept of CDP is simple, there is a reason why solutions only became feasible a few years ago: It is a hard problem to solve. A handful of vendors are now offering CDP products. But the potential of CDP technology is far greater, and some innovative vendors have developed comprehensive recovery solutions that go far beyond what base CDP offers. These vendors are poised to change data protection in fundamental ways.
The use of fine-grained data capture and APIT image creation has become an extremely useful tool for disk-based recovery of relatively recent data. By enabling the IT manager to select, construct, and present any historical image ("recovery image") back to the application, it transforms the way enterprises think about data protection.
In particular, we already see the worlds of database and email data protection going through a rapid adoption of CDP. The key breakthrough is the ability to automatically align any subset of an application's data with the precise moment of desired recovery, create the appropriate image based on its time-aware metadata, and present that zero-data-loss image up to the application. In fact, an awareness of how CDP solutions can form the foundation of advanced recovery management capabilities for enterprises of all sizes is beginning to emerge, and a number of powerful use cases have emerged, including:
Inexpensive DR: In many cases, CDP solutions can either capture data from local clients into CDP targets at remote sites, or can continuously replicate data asynchronously between local and remote targets. This can provide for disaster recovery, even for heterogeneous environments at both sites, with significantly less complexity than dealing with array-based replication that may have to cross multiple arrays while carefully marking and managing a single known good point in time for recovery. Moreover, in comparison, CDP-based DR can offer much greater tolerance for data corruption versus replication solutions that do not preserve historical data points.
Local backup and recovery with complete DR: In this case, source servers are configured to capture their data changes to a local CDP target, and the data is simultaneously replicated (either by the host or by the CDP target) to another CDP system at a remote site. This provides an onsite repository for data restoration or testing, as well as a remote site for site-level recovery or offsite testing, with full sets of data at both locations. In comparison to remote repositories that may need to have full sets of bandwidth-consuming PIT backup data replicated (and involve complex recovery operations through data protection software) leveraging CDP for localized backup and DR can be much simpler.
Integrated backup and DR with compliance requirements: Using the same architecture as the solution above, a CDP solution can effectively front-end tape storage. All near-term recovery operations are serviced directly from disk using the CDP solution, and data is periodically dumped off to tape directly from the CDP repository, either synthetically recreating client data or as a block-level image of the CDP repository. In this way, for CDP-attached hosts, all tape interactions can be eliminated and the supporting tape infrastructure can be significantly reduced, except for the tape infrastructure attached to the CDP system. These tapes can then be stored locally or remotely to meet compliance requirements.
Taneja Group opinion
It has taken much longer than expected for IT to reach a point where they understand the power of CDP. Of course, all the major players jumping into the fray over the past few years has helped make the technology mainstream. Today, there is a long list of players, including Atempo, BakBone, DataCore, EMC, FalconStor, IBM, InMage and Symantec, to name just a few. Some players dipped their toes into the CDP pond and have only recently added replication. Some focused on the DR side first and have now extended their solutions to local backup and recovery. We see all of them moving eventually towards more sophisticated continuous data technologies (CDT), whether they know it or not (and whether they call it CDT or not).
Such is the power of CDT. Once you generalize the model and have the ability to generate images at any historical points in time, everything about copy creation changes. It flips data protection on its ear, eliminating backup windows and providing rapid, reliable local recovery with near-zero data loss; it enables DR for the masses; enhances the benefits that accrue from virtualizing servers; and enables new applications that were typically outside the data protection realm. Most importantly for IT management, it simplifies their environment by replacing multiple products and processes with a single solution that serves multiple purposes.
JEFF BOLES is a senior analyst and director of validation services with the Taneja Group research and consulting firm.