By Eric Burgener
The cloud enables a new set of solutions to solve perennial storage problems much more cost-effectively. Data protection stands to benefit significantly from cloud-based computing, in particular because cloud computing can provide the foundation for easily accessible, affordable disaster recovery (DR) solutions. This easy access facilitates rapid implementation of off-site protection for new projects at larger enterprises, and can enable DR solutions that small and medium enterprises (SMEs) could not afford in the past. Given the increasing criticality of data, all enterprises should have a DR plan in place for at least key applications. But many do not, primarily due to cost and complexity issues. Cloud-based infrastructure provides an interesting DR alternative that addresses both of these issues.
DR plans are built around the existence of a remote, secondary site where data can be stored and/or application services can be restarted in the event of a catastrophic disaster that cripples a primary site location. Driven by an enterprise’s recovery point objectives (RPOs) and recovery time objectives (RTOs), a secondary copy of data will be regularly maintained at this secondary location. When enterprises must purchase, configure, and manage the secondary site location, DR can be a very expensive proposition – one out of reach for most small enterprises. But cloud providers can offer a very cost-effective secondary site location, making compute cycles and storage capacity immediately available in increments of varying sizes on a pay-as-you-go basis, while taking on all the infrastructure management issues.
Cloud providers such as Amazon Web Services, with its EC2 Compute Cloud, and GoGrid, with its GoGrid Cloud Hosting, make compute cycles and storage capacity available on an immediate basis. Replication can be used to create and maintain copies of an enterprise's data at these sites. When events affecting an enterprise's primary location occur, key application services can effectively be restarted and run at the remote location – incurring no capital expenditure, only operational expenditure - until such time as the primary site is brought back online.
Replication technology is available in storage arrays, network-based appliances, and through host-based software.
Array-based replication typically requires similar arrays at both the source and target locations, making it a poor choice in replicating data to cloud providers that likely won't have the same array you do in their cloud infrastructure.
Network-based appliances require an appliance at both the source and target locations as well, and while they are much more cost-effective to implement than array-based approaches, they basically suffer from the same infrastructure issue that array-based replication does: the cloud provider is unlikely to have or make available to you the same type of network appliance deployed at your site.
Host-based replication, which basically just runs on an industry standard server, is an excellent fit; cloud providers allow you to request Windows, Linux, and in some cases even other Unix servers, when you rent compute cycles from them, allowing you to replicate from servers of these same type at your location to theirs very cost-effectively.
Host-based replication comes in two flavors. Vendors such as CA (the XOsoft product line) and SteelEye use block-based replication approaches, while vendors such as DoubleTake and NeverFail use file-based approaches. Both approaches can be used to replicate entire virtual machines in real time, but block-based approaches offer a more comprehensive solution (due to the ability to replicate all data, not just files) when replicating physical machines to cloud-based infrastructure. Products that support multiple operating systems, as opposed to just Windows, can also offer more comprehensive solutions with a common management paradigm across platforms. Host-based replication solutions can be configured for just a few thousand dollars, and when combined with cloud-based infrastructure offer a very low-cost DR solution that allows protection to be extended lower in the organization for larger enterprises and makes DR an affordable option for smaller enterprises.
When host-based replication is combined with cloud-based infrastructure, the target devices are kept online at all times, ready to support very rapid server and application recovery if needed. When RPO and RTO requirements are less than 48 hours, these types of configurations meet the need, and can in fact meet much more stringent RPO/RTO requirements (e.g., under four hours) depending on how they are configured.
Replication and cloud computing can also be considered as an alternative to local backup. Disk-based backup has a lot to offer companies, including faster backups, faster restores, and more reliable recovery (relative to tape-based infrastructures). If you’re considering moving to disk, don’t overlook the fact that it gives you access to replication technology. For data sets that require stringent RPOs/RTOs, replication can be used to kill two birds with one stone: data is quickly and easily available for file- and even system-level restores from the remote location, but the fact that the location is remote provides the resilience demanded by a DR plan.
When considering cloud-based infrastructure offerings, security is a common concern. Larger enterprises may have implemented very strong security approaches that may or may not be equaled by cloud providers, but don’t just assume that security is a problem. Look for the type of security functionality you would look for in an in-house solution. Is data encrypted in-flight? Is it stored at the cloud provider site in encrypted form? What level of encryption is implemented? Your cloud provider may or may not give you the option to host your servers and dat” on dedicated resources, but if they don’t bring it up, you should.
The use of server virtualization technology is widespread in cloud provider infrastructures, so ask providers to at least dedicate virtual servers for your implementations. If data is stored in encrypted form – provided that encryption is reasonably strong – the data will be protected from being stolen and misused. Look for encryption that is at least SHA-1 equivalent (160-bit), with SHA-2 equivalence (a minimum of 224-bit) being even better. Does the cloud provider provide access through secure virtual private networks (VPNs)? For many smaller enterprises, the level of security provided by cloud providers may in fact be greater than they can provide themselves. Don't naturally assume that just because it's not behind your firewall, it's not secure.
If you're a cloud provider, you have probably already been asked by your customers about your ability to create and maintain two copies of their secondary data at separate geographic locations. Replication is clearly the technology you will consider to offer this service. If your customers require that this data be kept online and rapidly accessible at all times, then you will likely not be able to use array-based replication, and will be interested in host-based replication for many of the same reasons that your customers are: it’s cost effective, supports more stringent RPOs/RTOs than offline replication technologies can, is more flexible in supporting a variety of different storage products, and helps you offer a more comprehensive DR service, complete with the ability for your customers to recover their servers in the cloud, than other forms of replication.
If you're considering what combining replication and cloud-based computing can offer you, regardless of whether you're an end user or a cloud provider, look for the following features:
- Synchronous and asynchronous replication options so that the technology can be used to address both short distance and long distance requirements; understand also whether it provides real-time, scheduled, or both forms of replication.
- Good integration points with snapshot technologies to facilitate data protection operations and server virtualization technology to lower the cost of DR operations.
- A conscious approach to maintaining the write ordering established by the production application; in order to maintain data integrity, it is critical that data is written to the target disk in exactly the same order that it is written to the primary disk (this is more of a concern when asynchronous replication is used).
- Fault management that will automatically re-synchronize source and target devices once live network connections are re-established; look at exactly how this is done to ensure that devices can be re-synchronized with minimal bandwidth and very quickly.
- Integrated technologies that minimize network bandwidth requirements during normal and re-synchronization operations.
- Support for encrypting data both in-flight and at-rest to at least a level of SHA-1 equivalence (with SHA-2 equivalence being preferred)
As the bar becomes ever higher for building resiliency into computing infrastructures, replication technologies will become part of the storage foundation. Cloud providers are in a good position to leverage this technology to meet existing as well as evolving customer requirements. In the near term, replication not only enables data recovery in the cloud, but server recovery in the cloud as well. Now that affordable, host-based replication approaches that can securely handle sizable data volumes through IP-based networks are available, don’t overlook what the combination of replication and cloud computing have to offer – regardless of whether you’re an end user or a cloud provider.
Eric Burgener is a senior analyst and consultant with the Taneja Group research and consulting firm, www.taneja.com.