There's one major problem with cloud storage providers like Amazon who offer virtually limitless, low-cost storage resources: cloud storage isn't as easy to use as you might think.
But the good news is that it is getting easier.
The root of the problem is that service providers like Amazon offer a cloud-based object store with interfaces such as REST or SOAP, but applications expect storage resources with block-based iSCSI or Fibre Channel interfaces or file-based interfaces, such as NFS or CIFS.
The solution for many organizations is a cloud storage gateway. This is a physical or virtual appliance that sits in your data center and presents file- and block-based storage interfaces to your applications. It then does some nifty protocol conversion so that data can be sent directly to one or (sometimes) more cloud storage services. To provide security for the data sent to the cloud, most gateways automatically encrypt the data before it is sent.
Connections to cloud storage tend to be slow and expensive, and they suffer from high latency. To speed up data transmission times (as well as to minimize cloud storage costs), most gateways also handle data deduping and compression. A device like Riverbed's Whitewater cloud storage gateway will typically achieve compression rations of about 25:1, reducing the data to just 4 percent of its original size.
But the speed gains are generally not enough to allow the use of cloud storage for primary data. That's where caching can help.
Most cloud storage gateways include significant amounts of local storage where recent backups and other data can be stored. From there, it can be quickly retrieved without having to fetch it across a slow wide area link from the cloud. In this respect, the gateways act in a similar fashion to wide area network accelerators, which explains the involvement of companies like Riverbed in the market.
Cloud Storage Gateways and Disaster Recovery
Let's imagine what happens if a disaster strikes your data center, taking out all your compute and storage resources and your cloud gateway appliance. Assuming you have a secondary data center, you can either use a secondary cloud gateway appliance at that location, or simply download and install a virtual appliance (if the vendor supplies one). Recovering your data is then a matter of transferring backup sets back to the gateway where the data is "rehydrated" — decompressed and undeduped — and passed back to your applications. It may be slow, but then so is recovering from tape backups.
As an example, California-based engineering company Psomas installed eleven Riverbed Whitewater cloud gateway appliances to link backup servers in separate locations to Amazon S3 cloud storage, replacing legacy tape storage systems. By switching to cloud storage for its offsite backups, with recent backups cached locally on Whitewater appliances, the company realized 60 percent savings in total infrastructure costs, including tape media, tape technology refreshes, and tape collection fees (although this does not include cloud storage fees) according to Chris Pinckney, Psomas's CIO. Other benefits included backup time improvements and faster restore times when the backup data exists in the cloud gateway cache (because local restores are faster than restores that involve collecting tapes from an offsite location).
Restoring backups from the cloud to a data center is always going to be slower than restoring it from local storage, but in the event of a disaster, the local storage is likely to be unavailable.
A popular way around the WAN bottleneck is the idea of recovering your compute environment in the in the same cloud as your storage. Using Amazon, you could recover your applications to Amazon's EC2 service, and then rehydrate your data stored in S3 using a virtual gateway appliance before sending it across to EC2.
Cloud Storage Gateway Vendors
Until recently most of the companies involved in this market were quite small, but at the beginning of 2012, a 600-pound gorilla entered the market in the form of Amazon with its Storage Gateway. This virtual appliance presents itself as an iSCSI target and provides a bridge to Amazon's own S3 storage. It effectively offers "gateway-stored volumes": all primary data is stored on local DAS, NAS or SAN devices, and point-in-time snapshots of this data are asynchronously backed up to Amazon S3.
Last October, the company made a significant upgrade to its gateway appliance by introducing "gateway-cached volumes." The difference is that in this configuration primary data is stored in the cloud in S3, with frequently accessed data cached locally on DAS, NAS or SAN devices. The idea is that by using the cloud for primary storage, more expensive overall local storage requirements can be cut, while still having low latency access to frequently accessed data.
StorSimple (now owned by Microsoft) is also at the forefront of a push towards this type of "cloud-integrated storage." It provides devices that offer gateway features, like protocol conversion and caching, but also local primary storage. "A key capability is the ability to mount cloud-stored volumes — through application-aware storage — and access only the needed data or objects for an application, rather than having to download a full volume from the cloud to the data center and then restore it — which is what a gateway has to do," says Mark Weiner, director of product marketing for Microsoft.
Mark Peters, an analyst at Enterprise Strategy Group, describes cloud-integrated storage such as StorSimple's like this: "Basically, users can enjoy the benefits of the cloud (economy, data protection, and access to features and functions that they might not otherwise have or afford) while still retaining control of their data and operations. It can be thought of as your own storage that's just on a very long wire (or even multiple wires) from the controller."
It's unlikely that traditional storage vendors will embrace this type of cloud integrated storage, since it does away with the need for many of their products. For the moment, it looks like it will be companies with a stake in cloud storage services — such as Amazon with S3 and Microsoft with Azure — that will drive the market from "simple" storage gateways to more sophisticated cloud integrated storage devices.