Replication is the key to a solid disaster-recovery plan, but there are a wide variety of options.
By Craig Everett
IT managers are placing renewed emphasis on protecting their companies from any type of disaster. With the realization that a disaster can affect anyone at any time, you have to ask yourself how a substantial outage would affect your business.
Most companies aren't prepared for a site outage, in part because of financial constraints. As such, disaster planning has often taken a back seat to other business initiatives that consume IT budgets. However, with downtime costs becoming more exorbitant, disaster recoveryincluding site replicationis slowly becoming a common weapon against outages and disasters. To truly achieve "five 9's" (99.999%) uptime, site replication should be part of your disaster-recovery plan.
Understanding your downtime costs is key to finding the right disaster-recovery solution. In some cases, a recent downtime experience helps dictate the need for a disaster-recovery capability. In proactive situations, downtime numbers can be calculated from company revenue and used to justify a budget for a new disaster-recovery implementation. In either case, finding the correct replication application for disaster recovery can be difficult.
Figure 1: Synchronous replication provides slightly better uptime than asynchronous replication.
Software vs. hardware replication
There are basically two types of data-replication applications on the market today: software- and hardware-based. Software-based replication relies on software that runs on the involved host servers and is storage subsystem-independent. Hardware-based replication, which is available from most major storage hardware vendors, is dependent on a particular brand of hardware and sometimes requires software installation on the storage array as well as on the servers involved.
Each type of replication has advantages and disadvantages. Usually, smaller installations that are application focused use software-based replication, while larger storage-centric environments rely on a hardware-based solution due to the dependency on the storage itself. However, that trend is slowly changing as software-based replication improves.
Regarding disaster recovery, many companies have regrets about deployment of a replication application. The most common issues are communications infrastructure shortfalls, replication product compatibility issues, product immaturity, ongoing support costs, and administration overhead.
Gauging your communications infrastructure can be difficult, especially when replication is centered on a new application. For instance, a new 10GB application that you only expected to write out 20% of its capacity every day (read intensive) is now writing 100% of its capacity (write intensive) daily, and those changes need to be replicated. If you only planned on a communications infrastructure capable of replicating a few gigabytes a day, you may be heading back to the drawing board. So for this reason, building scalability into your storage-networking plan is critical.
Even if you don't have a large heterogeneous environment, choosing a replication solution that will accommodate any possible scenario is a good idea. A new company initiative could cause you to deploy a platform or operating system in your data center that you didn't anticipate supporting with your replication environment.
Product maturity is also a problem. Since the replication market is relatively young, there are several companies and products on the market that simply don't have enough time in the field. Lack of experience and product immaturity can cause some significant time-to-market issues.
It is also wise to evaluate the quality and cost of ongoing product support. This can be a long-term cost that can drain your valuable time and budget.
Finally, there is the issue of administration overhead. How will you support this product in your enterprise? Check customer endorsements and references. This is always a good way to find success stories or horror stories. Taking time to visit a reference site for a product you are interested in is probably your best defense against making a bad choice.
Getting the most "bang for your buck" is important, and choosing the correct feature set is the "bang" in that equation. Almost every replication product is unique and offers its own feature set. These solutions can be synchronous, asynchronous, application-aware, host-dependent, host-independent, etc. However, the problem is that none of the products on the market today offer a solution with every replication feature bundled into one product. So finding the solution that is right for your business can be a little tricky.
Figure 2: Asynchronous replication is only required to confirm application writes to the primary storage; then writes are written to the secondary storage while the application continues with its operations.
Synchronous vs. asynchronous
The common data-replication modes are synchronous and asynchronous. Synchronous replication involves application writes that are committed to both the primary and secondary storage before the application can continue. On the other hand, asynchronous replication is only required to confirm application writes to the primary storage or local staging area. Then writes are written to the secondary storage while the application continues with its operations. Asynchronous replication can be better for application performance since the application writes aren't required to confirm committed application writes to both the primary and secondary storage sites.
An asynchronous replication implementation would be a good fit for an application that needs to be replicated over a small communications infrastructure like a T1 line. That environment would benefit from an asynchronous approach because the database can continue accepting transactions even though replicated data isn't completely copied to the secondary site.
On the other hand, a synchronous implementation would not fare as well in that situation. The application being replicated would suffer some serious performance degradation while it waits for writes to be confirmed to the secondary storage.
A drawback to asynchronous replication is that if you have to completely fail over to your secondary site, your fail-over time could take much longer than with synchronous replication. But if you can afford the delays, asynchronous replication is the most cost-effective solution. For some applications, however, delays are unacceptable, in which case synchronous replication is mandatory.
With some synchronous solutions, a complete site fail-over takes only seconds. For instance, a company could go from logging customer support calls at their California facility at 5 PM PST and migrate operations to its Australia facility within seconds. The only reason the customers know there was a change is because the call center representatives now have a different accent. This scenario would require a much larger communications infrastructure than what would be required for an asynchronous solution.
The price for synchronous replication can go up significantly due to the storage and communications infrastructure requirements. Since performance is paramount for synchronous replication, you would need a robust storage environment capable of the best I/O performance.
Storage networking can be costly as well. Some of the most popular options available include Fibre Channel switches and Fibre Channel over IP (FCIP) routers if you plan to use an existing IP network between sites. These storage-networking options can add some zeros to the end of your total solution cost.
However, replication is growing in popularity throughout the industry. As the demand for replication increases and users employ more cost-effective solutions such as iSCSI and FCIP, vendors will begin to lower the cost of expensive gateway devices, which have traditionally been out of reach for some companies.
A good way to narrow down the search for a replication application is to focus on your requirements. What are the requirements your business needs to implement a successful disaster-recovery solution? These requirements could include off-site backups at a secondary site, application analysis and development at the secondary site, site migration for 24x7 operations, site maintenance, disaster rehearsals, and numerous other options. Whatever you decide you need, it's best to create a checklist of overall requirements to use during an evaluation. Share this checklist with as many people as you can within your company who will be involved with the replication application and solicit their feedback.
A closer look at products
Hardware-based replication solutions are offered by vendors such as Compaq, EMC, Hewlett-Packard, Hitachi, and IBM. Software-based replication vendors include Legato, Microsoft, Oracle, Sun, and Veritas.
Figure 3: Hardware-based replication is tied to specific brands of disk arrays.
All hardware-based replication solutions require you to use specific disk arrays with the replication software. Most hardware-based approaches have traditionally operated in synchronous mode. However, most vendors have added other data-transfer modes to their products as users demand better functionality. Automation of these products, as with any application, can require advanced scripting and code development on the part of the end user.
Hardware-based replication benefits include the following:
- Less initial configuration is required;
- There is complete application consistency with secondary site storage in synchronous mode;
- Application doesn't require awareness of replication; and
- Usually, all hosts attached to storage environment can use replication feature.
Compaq's Data Replication Manager is available for Compaq's StorageWorks disk arrays. Compaq recently added support for FCIP. In addition to Compaq disk arrays, Data Replication Manager requires Compaq's Fibre Channel switches.
Configuration of Data Replication Manager is fairly simple with the use of Remote Copy Sets (RCSs). The sets are configured by pairing volumes from a primary and secondary server, which are then managed in groups usually by application or database. Each RCS is controlled independently for better autonomy and flexibility. Data Replication Manager can be run in synchronous or asynchronous mode.
One of the most popular hardware-based replication solutions is EMC's Symmetrix Remote Data Facility (SRDF). According to Gartner Group, SRDF leads the industry in installations. Due to SRDF's maturity, it offers several unique options that other companies are still trying to duplicate. SRDF offers several different replication modes (synchronous, asynchronous, semi-synchronous, and adaptive copy) and synchronization options for data consistency. Different replication modes allow you to dictate application performance and fail-over timeliness.
SRDF works as an enhancement in the Symmetrix microcode and with software installed on the server managing the Symmetrix array. SRDF essentially mirrors devices on a primary disk array to a remote-site disk array.
Although Hitachi has historically trailed EMC in the marketplace, its recent agreement with Sun is a huge step toward pushing Hitachi to the head of the pack. From a feature perspective, Hitachi's TrueCopy and EMC's SRDF have many similarities. TrueCopy is an enhancement within the microcode on the Hitachi 7700/9900 storage arrays.
TrueCopy basically uses LUNs on the primary site to mirror the secondary site. TrueCopy has an add-on product called NanoCopy, which gives it some additional functionality for IBM systems, as well as some advanced point-in-time copy capabilities. (Hewlett-Packard's Continuous Access XP for the HP XP512 storage array is OEM'd from Hitachi.)
IBM's PPRC is another hardware-based replication solution. This product operates only in synchronous mode and only with IBM storage products. IBM offers an asynchronous mode of replication with a product called XRC, but it works only with OS/390 environments.
Figure 4: Software-based replication can be used with any vendor's disk arrays.
The primary advantage of software-based replication products is that they are not tied to specific hardware brands. In addition, software-based solutions can be less expensive and more versatile.
Software-based solutions are designed to work at the application level or at the operating system level. Application-level replication products are transaction-aware and can provide some advanced recovery options. The products that are designed for the operating system level are usually focused on disks or volumes, as opposed to the application running on those devices.
Application-level replication is available from Microsoft and Oracle for their databases. There are several ways to provide replication for a database, especially with Oracle. You can replicate point-in-time copies, redo logs, etc., but this article focuses on replicating the entire database.
The original database-replication feature for Oracle was called Snapshots. There are two types of snapshots, which are point-in-time copies: fast snapshots (only logged changes) or complete snapshots (the entire table). The snapshot can be read-only or updateable. An updateable snapshot can be useful if you want to allow temporary modifications at a remote site that are valid until the changes are approved at the central site.
For Oracle installations that require more replication functionality, there is Advanced Replication. With Advanced Replication you have several options (e.g., synchronous or asynchronous), as well as the ability to choose exactly what is replicated (e.g., complete tables or just changes).
Since Oracle replication uses standard Oracle database technology, database administrators can use their favorite DBATool or SQL interpreter to administer replication.
Microsoft SQL Server has a replication option that has several features similar to Oracle. Microsoft's replication is commonly referred to as "publish and subscribe," with the master database the publisher and the secondary database the subscriber. SQL Server replication features snapshot, transactional, and merge (asynchronous) options for synchronization. The SQL Enterprise Manager is used to administer replication, which facilitates configuration.
Operating system level replication software is available from Legato, Sun, and Veritas. These products replicate volumes or disks to secondary sites using standard IP networks, so you don't have to install an expensive, dedicated fiber backbone. You also have more-flexible storage options so you can leverage existing disk arrays from multiple vendors.
Legato's Octopus, which will soon be renamed RepliStor, supports replication for Windows. This product is host-recovery-oriented and includes the ability to copy the identity of an entire server to a secondary site. You can also replicate domain controllers running Active Directory. Octopus is a good fit for Windows-only environments. One thing to keep in mind is that when using Octopus to replicate an entire server, the hardware and operating system version has to be the same at the primary and secondary sites.
Sun offers StorEdge Network Data Replicator (SNDR) for Solaris 2.6 to Solaris 8. SNDR works with Sun StorEdge products and also third-party storage. Sun supports Solstice Disk Suite for volume management with SNDR, which allows customers not using Veritas Volume Manager on their Solaris hosts to replicate data in the enterprise. Sun's Instant Image in conjunction with SNDR can give you snapshots on both the primary and secondary sites to allow greater replication functionality. Point-in-time snapshots can be used for backups, hot standby, and data analysis.
Veritas' replication software is called Veritas Volume Replicator (VVR), which works hand in hand with the well-known Veritas Volume Manager. If you're already using Volume Manager, the installation of VVR is just an addition of a license key. VVR uses Replicated Volume Groups (RVGs) to replicate volumes to as many as 32 secondary data storage sites. It can operate in both synchronous and asynchronous modes. Since VVR works with Veritas Cluster Server and Veritas Global Cluster Manager, you have several advanced site migration and fail-over options. These advanced features distinguish VVR from all other replication products. Windows support for VVR is also available.
Craig Everett is a senior storage consultant at Advanced Systems Group (www.virtual.com), an enterprise computing and storage consulting headquartered in Denver. He can be contacted at firstname.lastname@example.org.