Eight years ago, most IT organizations’ ideas of affordable remote disaster recovery (DR) would have consisted of tape backups stored off-site and the potential use of a third-party col- location facility in the event remote system recovery from tape was needed. While still one of the most affordable options today, this method for remote DR comes with trade-offs: primarily, a gap of several hours to several days in both time to recover and the amount of data lost since the last backup to tape. Back in the “old days,” only the upper echelon of enterprises- such as many companies in the financial and insurance sectors-could justify their investments in expensive frame-to-frame remote replication technologies from vendors such as EMC, Hitachi Data Systems, and IBM that could satisfy their very short recovery time objective (RTO) and recovery point objective (RPO) requirements.
Today, a different story has emerged. Although it is still a significant investment for any company, remote replication to an off-site disaster-recovery facility has become more of a de facto practice among a growing number of enterprises and midrange companies, due in large part to a mix of technologies that are driving down the cost required to replicate and restore data off-site. From data de-duplication and wide area data services (WDS) to server and storage virtualization, remote DR has begun to come into its own.
Forrester Research’s report, “Maximizing Data Center Investments for Disaster Recovery and Business Resiliency,” showed that 65% of the respondents had at least one alternate backup data center, with an additional 10% of respondents planning to have one in the next 12 months (see figure, p. 22). Of that number, 46% claimed they owned the recovery site themselves versus using a service provider’s facility or a collocation site.
From D2D To De-dupe
Someone well-acquainted with the use of a mix of technologies for backup and off-site DR is Jon Robitzsch, senior manager of infrastructure systems at Atlanta-based Interactive Communications (InComm), which pioneered the use of pre-pay technologies, pre-paid gift cards, and solutions for a growing range of resale merchants.
When Robitzsch and his team thought about overhauling InComm’s backup and remote DR infrastructure, Robitzsch opted for a combination of disk-based backup, replication, and data de-duplication technologies to protect more affordably InComm’s growing data stores and to advance the company’s longer-range off-site DR plans.
Robitzsch estimates it would have cost about $1.8 to $1.9 million over the next three years to expand the company’s legacy infrastructure consisting of a Nexsan SATABeast disk array, an Overland Storage tape library, and Symantec’s NetBackup software. In contrast, the solution set he ultimately chose to replace it came to about $1.1 million over three years.
Now Robitzsch uses CommVault’s Galaxy Data Protection 6.1 software to back up approximately 800 servers and 250 workstation clients to a Data Domain DDX array with DD580 controllers that perform global compression and data de-duplication. (He also used CommVault software to replace the old tape-based backups being performed at InComm’s six remote offices with centralized, disk-based backup to the company’s DD580 system at its corporate headquarters.)
According to Robitzsch, InComm’s Data Domain infrastructure has led to significant reductions in the backup space required, especially for his company’s several hundred virtual servers running VMware’s VI3 software. “We back up roughly 450 of our virtual servers as flat files. Because a lot of these files are very similar and often have a similar format, the de-duplication savings are enormous,” he says. “We’re seeing 30:1 or 40:1 de-duplication ratios and have only been doing this for a few months.”
Looking at all of his backup data, Robitzsch estimates he can now store about 17TB of backup data in just 1TB to 2TB of space. At a de-duplication ratio he currently estimates at an average of 12.5:1 and growing, he’s confident the ratio will rise over the next few months to 20:1 or 22:1 and enable InComm to store as much as a year’s worth of backup data (about 250TB) on only a 20TB to 30TB storage footprint.
Although InComm still performs monthly full backups to tape, which are then shipped to the company’s out- of-region DR facility, Robtizsch plans to acquire two more Data Domain DD580 systems next year for use at his remote DR site. “When we purchase our third and fourth Data Domain systems next year, we plan to replicate the data here locally just to get the initial data transferred across. Then we’ll ship them out to our disaster-recovery site and start collecting only incremental changes to bring both sides in synch,” says Robitzsch. He plans to use Data Domain’s Replicator software to replicate changes between the two sites over a WAN connection.
De-Duping For Remote DR
Data Domain’s Replicator software allows data compression and de-duplication to be performed at both locations, which can significantly reduce the WAN bandwidth required for replication. Brian Biles, vice president of product management at Data Domain, explains. “Your WAN bandwidth goes down to 1% per day. That’s 80% to 90% smaller than an incremental backup,” he says. “Whatever size link you can afford, you can do quite a bit more data sets because the amount of bandwidth you need is a lot smaller.”
Many storage vendors, backup service providers, backup/replication software vendors, and WAN optimization vendors offer data de-duplication and compression as a way to significantly shrink data storage requirements or WAN bandwidth consumed. On the backup side, some perform block-level incremental backups while others focus more granularly at capturing byte-level changes. Some perform de-duplication during the initial backup process, while others perform it after the initial backup completes. Your mileage may vary in terms of latency or de-duplication ratios, but all solutions can provide significant bandwidth and storage space reductions compared to more-traditional backup processes.
In storage circles, data de-duplication is discussed primarily in terms of reducing the backup footprint at a primary site, not so much in terms of replication to another site. However, that appears to be changing. InComm is indicative of a trend in its planned extension of de- duplication technology for remote DR. This extension makes sense, says Scott Robinson, chief technical officer at Datalink, an independent storage consulting and integration firm. “Companies have been incorporating disk into backup-and-recovery environments for some time,” says Robinson. “Since they already have data stored on disk for backup and recovery, many are now asking how they can use it for disaster recovery.”
Biles claims about two-thirds of the company’s customers end up using their Data Domain systems over a WAN connection. He cites another customer, the US Army, which uses Data Domain technology to replicate backups inter-continentally via satellite. Biles says this became a viable option for the US Army only after it implemented solutions from both Data Domain and Juniper Networks so that it could make both the data size small enough to transport and also significantly reduce satellite latency issues.
Optimizing The Pipe For DR
Taneja Group analysts Steve Norall and Jeff Boles have seen de-duplication solutions used more for backup and archival thus far, and less so for replication. When it comes to DR, says Boles, users tend to think more in terms of having a hot site available for near-immediate fail-over. That’s still different from using an off-site location for data protection. Taneja’s Norall concurs, noting that the first order of business for most off-site DR implementations tends to be how best to achieve a shorter RPO and/or RTO.
What the analysts do see, however, is the use of technologies the Taneja Group refers to as WDS to help optimize the efficiency of data transport between sites. Often including WAN optimization devices and/or wide area files services (WAFS) technology, such solutions are offered by several vendors, including Brocade, Cisco, Juniper, Riverbed, Silver Peak Systems, and others.
Datalink’s Robinson also sees considerable traction with these types of products for remote replication. “We’re seeing customers using replication products with block-level incrementals who can still benefit from using a WAFS [and/or WAN optimization] product on top of that.”
Taneja Group’s Boles acknowledges the lines can blur between solutions offering disaster recovery and high availability. “You can have two overlapping spheres of business objectives. One is high availability and the other is disaster recovery. You can now get some aspects of high availability with a robust disaster-recovery solution,” he says.
Tiered Data Protection
Disk-based data protection, DR, and high-availability solutions are coming together as well, often propelled by the adoption of a tiered data-protection strategy. An increasingly popular incarnation of the often-beleaguered concept of information lifecycle management (ILM), tiered data protection is the practice of classifying different types of data based on the level of protection, availability, and recovery each requires. The second part of that is developing data protection and recovery service level agreements (SLAs) for each class of data, then implementing the appropriate data protection, high-availability, and DR solutions to satisfy each data type’s SLA requirements.
This practice is typified in the initial questions that Stephen Foskett suggests companies ask about what they need to accomplish when attempting to cut costs for remote DR and remote replication. “Instead of trying to squeeze more content into the same size bag, try to figure out instead how to reduce that content,” says Foskett, a director of storage practice at Contoural, an independent consulting firm. “Instead of buying new technology and hardware, ask first what you really need to replicate.”
Foskett claims that as much as two-thirds of a large company’s current data doesn’t need to be replicated, since much of it consists of development and test data. “Ask the question, ‘Do I really need to replicate this?’ You can eliminate far more storage that way than you can by using de-duplication, WAFS, or [WAN] accelerators,” says Foskett.
As an example of how much savings these types of questions can translate into for remote DR, Foskett describes a recent tiered data-classification service Contoural performed for a well-known insurance company. The company was concerned about ensuring the availability of its systems and asked Contoural to help classify its data based on the type of protection it would require. Each type of data was categorized in several ways, based on its local protection needs, availability requirements, performance requirements, its need for either short- or long-distance replication, and its archiving requirements. “It was a very soup-to-nuts approach that asked, ‘What is this data and what does it need in terms of protection?’ ” says Foskett.
The result? “In the end, less than 20% of the company’s data was determined to be mission-critical and needed long- distance replication. For the rest, we were able to apply less costly concepts to it and still have a good service level for users’ needs,” says Foskett.
How Far And How Fast?
According to Foskett, you should also consider whether data needing remote replication requires long-distance (and potentially high-cost) replication. Companies can realize cost savings by implementing short-distance replication to another building or across an office park if their data recovery needs don’t require long-distance replication.
Short-distance replication opens organizations up to the possibility of using high-availability solutions from vendors such as Neverfail, Sun, Symantec, and others instead of only replication solutions. Clustering technologies also have a fit here, according to Foskett.
In a recent end-user survey on disaster recovery, Symantec found that 48% of the companies surveyed had to implement their DR plans for an actual failure scenario. Dan Lamorena, senior product marketing manager in Symantec’s data-center management group, says the biggest reasons for failure were hardware or software failures, followed by natural disasters such as fire or flooding. And about 15% of the respondents had to institute their DR plans because of security threats.
In terms of whether or not to use clustering or replication technologies or some mix of the two, Lamorena maintains it usually comes down to choosing a solution that fits the needs of your applications. “Some applications may need to be up in four to eight hours, whereas with other applications you might be able to be down for a few days,” he says.
Lamorena cites the case of a large shipping company in the Midwest with two sites: a primary site in the Midwest and another one on the East Coast. The customer has a large SAP environment with a variety of underlying, mission-critical databases it uses for applications such as Web-based transactions. The East Coast site serves as the company’s SAP test and development site.
This company ended up using a combination of Veritas Provisioning Manager, Veritas Cluster Server, and Veritas Storage Foundation for its tiered availability and protection needs. The company clusters its mission-critical databases to the remote site. The East Coast site can re-provision its test/development servers to look like SAP production servers, when needed. With clustering, the databases can start up automatically at the East Coast site in the event of a primary site disruption. Even though Lamorena admits clustering can be more expensive than other options, it may be warranted if you need high availability and rapid fail-over. Then, he recommends the use of snapshot or replication technologies for other applications that aren’t as critical.
Reducing Off-Site DR Costs
As it turns out, cost reduction in off-site DR can be a bit of an oxymoron. Admittedly, companies that never really invested in off-site DR or remote replication before will be spending more of their IT budget than ever to get there. But, Fairway Consulting Group CEO James Price usually contrasts the investment against the potential loss avoidance savings companies could gain in the event of a real disaster.
“Companies replicating from a primary site to a secondary site do so today to offset the downtime and financial loss due to an outage,” says Price. “Replication can be very expensive, especially if it’s done to the level where it’s actually usable. Companies doing it are looking at that cost and saying, ‘As expensive as that is, it’s a drop in the bucket compared to our primary data operations facility being offline for hours, days, weeks, or months.’ ”
Similarly to Contoural’s Foskett, Price often starts the discussion of DR by asking customers, “What have you got to have to keep your company alive?” Specifically, he tries to determine the minimal amount of information a company needs to continue business-during or after a disaster. “Some companies think they want DR for everything, but when you start wrapping costs around replicating everything, you may realize that you only want to replicate a subset of the data.”
One company that continues to refine its tiered data-protection process is Munich Reinsurance America. According to Robert Shapelow, an enterprise backup manager at Munich, the company has instituted a number of methods to protect both its critical applications and the rest of its systems. For the most critical data-primarily Oracle and SAP database systems-the data resides on an EMC Symmetrix array that is then replicated via EMC’s SRDF to another Symmetrix at the company’s warm DR site. But, Shapelow points out that “90% of our systems don’t require that. Those are backed up nightly through CommVault [Galaxy data-protection software].”
Shapelow says Munich Reinsurance also uses CommVault’s Continuous Data Replicator (CDR) software to replicate delta changes back to the company’s primary New Jersey data center from eight branch offices. While this is a common use of CDR, CommVault’s Lucia Trejos says the company also sees customers using CDR to leverage remote site or virtual environments for disaster recovery.
Shapelow’s future tiered data-protection plans include the ability to replicate both primary production data and backup data to the company’s warm disaster-recovery site. Although currently still backing up to tape, Shapelow says the company plans to move toward using CommVault software with a virtual tape library (VTL) with data de-duplication functionality. “We’ll put a de-duplication device on-site and replicate to the off-site location,” he explains. Shapelow says the company is also currently in the process of switching its VMware virtual servers to use VMware Consolidated Backup (VCB) in conjunction with CommVault’s Galaxy software to direct virtual server images to the Symmetrix array where data can be replicated to the warm site via SRDF.
Don’t Forget Virtualization
Server virtualization and its ability to help reduce the cost of remote DR has gained traction over the past several months. “The biggest expense, after personnel, on the secondary site tends to be hardware,” says Lamorena. “If you use server virtualization tools, you can reduce the number of boxes you have at the second site.”
The growing popularity of server virtualization is putting greater emphasis on virtual storage architectures as well when it comes to cost savings in remote DR. When it comes to replication, companies can choose from a wide range of solutions. Software applications from vendors such as DoubleTake and Neverfail are examples of server-based replication solutions. In addition to server-based approaches, replication choices include appliance-, array-, and database-level replication.
Among this range of current choices for remote replication is also a growing mix of storage virtualization software vendors such as DataCore and FalconStor, as well as combined hardware/software virtualization vendors such as EMC, Hitachi Data Systems, and IBM. Users also have the choice to implement local and remote replication services on top of the vendors’ respective virtualization platforms.
The Taneja Group’s Norall sees three primary levels of technology at which data is moved across the wire from a primary to secondary site: host-based replication products installed as software on the host agent, network virtualization products that support both asynchronous and synchronous replication, and array-level replication and mirroring.
Datalink’s Robinson says companies have a lot more choice and cost-savings opportunities in the type of storage system they can now deploy at their secondary sites, noting that there are now a variety of heterogeneous replication solutions available to help drive down the cost of remote DR.
Although Robinson doesn’t see large companies wanting to replicate between a Symmetrix and a Nexsan system, for example, he has seen the desire to replicate to lower-cost systems in the same vendor’s line. “We may see customers going from a Symmetrix to a Clariion. The same thing with Hitachi and its replication technology on the USP [Universal Storage Platform],” he says. “Technically, while I still need a USP to replicate on the other end, I can use cheap disk behind it. Although it’s still frame-to-frame, it’s a way to reduce the overall cost.”
Regardless of which technology you choose, there’s no doubt that there are more choices than ever when it comes to cost savings for remote DR.