Cloud storage opportunities and challenges

Providers (internal or external) have to address scalability, privacy, data protection, manageability, and security issues.

By Saqib Jang

-- With total IT spending on cloud computing projected to grow at least threefold by 2012, a lot is being heard today about the potential benefits of cloud computing.  From an IT standpoint, cloud computing for enterprises promises to deliver elastic scalability, pay-as-you-grow efficiency, and a predictable cost structure, while at the same time improving access to data. On the business side, this translates into an ability to turn capital expenses into operating expenses and increase productivity and innovation, all while reducing IT expenses and business costs.

Technological maturity is making workable cloud solutions both possible and affordable. Most large companies are already exploring ways to make corporate data centers more "cloud-like" to boost efficiency, cut capital costs, and provide the elastic scaling needed to adapt to rapidly changing business requirements. However, the best ways to accomplish this—especially where storage is concerned—may be still unclear.

Whether a cloud is public or private, the key to success is creating an appropriate server, storage, and network infrastructure in which all resources can be efficiently utilized and shared. Because all data resides on the same storage systems, data storage becomes even more crucial in a shared infrastructure model.

It is already clear that on the compute side, server virtualization technology provides an appropriate infrastructure for cloud services, because it allows compute resources to be efficiently partitioned and quickly allocated, increased, decreased, or de-allocated as needs change. A rapidly maturing set of virtualization management services also helps provide speed, flexibility, and enhanced availability. Leading providers of cloud services such as Amazon Web Services (AWS) are already taking this approach in order to leverage the latest virtualization technologies.

The pressure to reduce storage operating costs and do more with less has also never been higher, especially in the face of accelerated data growth, even in a down economy.  Traditional storage technologies were not designed for use in the multi-PB scale Web 2.0 era. With traditional storage architectures, new arrays need to be added as capacity requirements increase. As the number of arrays under management grows, the storage environment becomes increasingly complex, harder to manage, and more expensive to operate. This has negative consequences for business—increased time to market, loss of productivity, and decreased flexibility.

A corresponding challenge is the much faster rate of growth of file-based data versus block-based data driven by the exploding growth in digital content. By 2012, industry analysts expect that more than 80% of all storage capacity will be file-based. This is not just for primary systems, but also for systems that store copies of data for data protection, disaster recovery, test and development, archiving, and collaboration. While traditional storage technologies continue to excel in the areas they were designed to address—namely, transactional computing—such solutions fail to curb file-based data sprawl. These factors are driving users to consider new storage deployment models, such as cloud storage.

Cloud storage is delivered as a service via a subscriber model. The service provider can be a company's internal IT group (private cloud), a third-party company that delivers storage services (public cloud), or a combination of both (hybrid cloud). Cloud storage economics benefit both the service provider and the enterprise customer. Service providers gain economies of scale via multi-tenant infrastructure and a predictable, recurring revenue. Enterprises gain the benefit of growing storage capability elastically by being able to allocate and de-allocate storage resources dynamically to deliver both appropriate levels of capacity and data protection.

Cloud requirements
At a top level, cloud storage must be elastic, to rapidly adjust the underlying infrastructure to changing demands, and automated so that policies can be leveraged to make underlying infrastructure changes quickly and without human intervention.

To deliver seamless and manageable elasticity, cloud storage service offerings must meet a number of requirements if benefits sought by enterprise customers are to be fully realized: cloud storage needs to scale quickly and to tremendous capacities. This translates into scalability across objects (billions), performance, users, clients (tens to hundreds of thousands of virtual servers accessing storage in parallel), and capacity (well into the PB range) with a single name space across all storage capacity being critical for low opex reasons.

Data privacy has emerged as both a concern and a priority in cloud storage, shared data center environments. Cloud storage services clients opt to store their data on a partition of a shared storage system, which alleviates the need for cloud storage providers to purchase and administer dedicated hardware. While sharing the storage infrastructure reduces costs, clients want to maintain their data privately from other customers. Cloud storage providers must establish multi-tenancy policies to allow multiple business units or separate companies to securely share the same storage hardware.

A proven storage infrastructure providing fast, robust data recovery is an essential element of the services that a cloud storage provider delivers. Enterprise clients expect their stored data to be available immediately, round the clock, which requires cloud storage infrastructure with high MTBF and MTTDL (mean-time-to-data-loss).

Enterprise users also want to make sure that their data is reliably backed up for disaster recovery (DR) purposes and that it meets pertinent compliance guidelines.  DR and compliance (PCI, SAS70, HIPPA, etc) is often neglected by IT organizations due to the cost and technical know-how required, making it among the most important services a cloud storage provider can provide. Cloud storage providers must automatically replicate customers' data to one or more data centers based on a service level agreement (SLA) policy basis that specifies the timeframe for data availability in case of disaster.

The need for improved manageability in the face of exploring storage capability and costs is a major benefit enterprises are expecting from cloud storage deployment. If Amazon Web Services (AWS) is selling S3 cloud storage capacity at 12 cents per GB per month, then one can imagine what their internal cost is; or what the cost of an enterprise implementing a private cloud storage or service provider implementing a public cloud storage might be. Basically, extrapolating the cloud storage provider pricing models, one storage administrator has to be able to manage 1PB+ of storage driving tremendous opex savings not possible with enterprise storage deployment models.

Secure data access
Enterprise IT's use of cloud storage also requires flexibility. Enterprise users have varying degrees of comfort when it comes to storing corporate data offsite, no matter how secure the external storage service provider is (or claims to be). The need to meet data security concerns and gain a level of comfort with the cloud storage model is driving the public, private, and hybrid deployment models for cloud storage.

IT organizations will be responsible for identifying the right location for specific types of storage on multiple cloud storage platforms—some within the enterprise and some external. And, in a hybrid cloud, infrastructure components and policies enabling multiple clouds to be managed as a single entity.

An important piece in the cloud storage puzzle is ease of access to data in the cloud, which is critical in enabling seamless integration of cloud storage into existing enterprise workflows and to minimize the learning curve for cloud storage adoption.  The existing focus of cloud storage providers has been to enable network storage for web applications through support of protocols such as Representational State Transfer (ReST), in which there is growing interest. (It is possible that an enhanced version of the Amazon S3 ReSTful API might become the de facto API for cloud storage access for web services).

However, requiring enterprises to use programmatic interfaces for accessing cloud storage can dramatically impact management costs as IT managers are not keen on the idea of training their personnel on new, foreign protocols or learning to write to proprietary cloud storage APIs as a requirement for utilizing cloud storage.

Enterprise users require the use of standard NFS and CIFS file access protocols to access data in the cloud, so that once data is written to specific NFS or CIFS mounts supported by the storage cloud users can read, open, and modify files as though they were working off a local NAS system. This could be done, for example, through deployment of an emerging class of cloud storage "on-ramp" customer-premise equipment (CPE) devices that abstract APIs and protocols supported by cloud storage providers in a way that makes them transparent to IT administrators.

For the foreseeable future, there will be multiple cloud storage offerings. Cloud storage CPE devices can also allow enterprises to transparently leverage best-of-breed cloud storage offerings while addressing the risk of vendor lock-in. Furthermore, operations such as authentication, encryption, compression, and data integrity could be implemented in the CPE device, enabling simplified and secure access to multiple cloud storage providers.

Sweet spots
Not all enterprise applications are appropriate for utilizing cloud storage. The reality is that for performance-sensitive applications the need for co-location of application servers and storage does not change with cloud storage. While internet bandwidth has increased, it has not done so enough to allow, for example, transactional database applications to run in separate locations from their storage. What this means is that cloud storage has either to be a part of a larger cloud-delivered whole application, or focus on a storage niche that lends itself to dis-aggregation.

The most mature deployment scenario for cloud storage is for applications which are well bounded in function and thus can easily be deployed as an integrated whole (i.e., using software, servers, storage, and networking) using cloud-based hosted software as a service (SaaS) or platform as a service (PaaS) models. Web-based e-mail or CRM (e.g. salesforce.com) are good examples of complete applications which have limited inter-relationships with non-cloud applications and thus can easily be delivered by the cloud.

Leveraging whole cloud-based applications provides enterprises with major ROI benefits. First, enterprises can offload the complete application stack (including software and hardware components) to the cloud application provider. Second, the incremental cost and time to add a new application user is effectively zero, enabling major gains in efficiency and productivity.

Storage-intensive business continuity (BC) and disaster recovery (DR) applications are also very well suited to leverage cloud storage.  Typically, BC/DR best practices require backups and archives to have an off-site component and many firms need advanced BC/DR capabilities (e.g., regulatory, risk mitigation, business partner requirements, etc.) but do not have the real estate or the expertise to build it themselves.  Backup/archive, medical imaging archiving, digital content serving, video surveillance, and data warehousing as cloud services have developed from co-location type services to shared tenancy models with advanced purpose-built architectures providing archiving, indexing and search, and long-term retention with support for video and large object access. Client systems leveraging cloud-based BC/DC services can range from a single server to a server farm providing ingest/playback, optimization, and management functions.

Network-based storage for applications such as home directories, CAD/CAM, seismic, and manufacturing is the third major use case for cloud storage. Most file storage is generally loosely coupled or completely disaggregated from application servers, and users typically make a network connection to file storage systems and browse and access files. Unlike transactional applications, such as on-line transaction processing and electronic mail which require high-speed access to block storage, access speed for file-based storage is not a critical requirements. In addition, enterprises with many remote offices already use WAN-based file storage, which makes the use of cloud-based file storage very attractive.

Cloud-based BC/DR and file storage services also enable significant ROI gains for enterprises through leveraging of storage multi-tenancy as well as offloading the management of enormous growth in unstructured content to cloud storage providers.

Barriers to adoption
Enterprise risks regarding moving to cloud storage fall into issues regarding shared tenancy and data migration/integration, contractual issues, and IT conservatism regarding outsourcing storage operations.

Cloud storage users must by definition be willing to have their data reside side by side with that of a competitor, which is why authentication and encryption mechanisms must be bullet-proof and clearly defined to alleviate this risk.  Quality of Service is also key concern arising out of multi-tenancy; enterprises worry that spikes unrelated to their business could cause disruption to performance or outages

When data is in the cloud, there is the major risk that it can be hard to get to it, and even harder to integrate applications to it.  As discussed, cloud providers requiring enterprises to use programmatic interfaces for data access is a major risk for enterprise cloud storage deployment. Enterprises are concerned that ceding control over data and moving it out of the enterprise can cut off application integration projects such as information management, data warehousing, etc. and make data migration for refreshes very challenging.

Last (but not the least) is the issue that storage administrators are very slow to adopt new technologies or paradigms, and anything that represents added risk or the unknown will see a slow adoption curve. Most enterprises will want to see well known peers adopt before them, and it is interesting to note here that first movers to cloud storage have been firms that cannot run their business in a traditional storage enterprise cost structure, such as Web 2.0 firms with file data and free or advertisement support products, and consumer/SME backup.

In summary, cloud storage is likely to restructure the enterprise storage market in ways that conventional models struggle with, and getting data management as a service is likely to provide a much better return on investment and address the growing need of uncontrolled data management costs. However, significant challenges need to be resolved before the paradigm becomes feasible for mainstream enterprises, and initial momentum in ‘sweet spot' deployment scenarios will shape the industry in years to come.

Saqib Jang is the founder and principal with Margalla Communications. He can be contacted at saqibj@margallacomm.com.

Related Articles:
NetApp positions ONTAP as cloud storage platform
Permabit partners with Mezeo for cloud storage

This article was originally published on September 16, 2009