What are the core capabilities that end users should look for when evaluating "storage in the cloud" solutions?
By Jeff Boles
-- As discussed in part one of this three-part series, at the heart of cloud-based computing is a loosely coupled infrastructure that is self-healing, geographically dispersed, and instantaneously scalable in response to business demands. Cloud-based computing virtualizes the location, connectivity, and resources behind loosely-coupled application components in order to be elastic --able to move and shift computing and storage resources, and rapidly deploy new systems or applications, in response to any demand. Moreover, cloud-based computing promises to make infrastructure, applications, and storage easier to manage, and much easier to integrate with other applications or changing business processes.
What does this mean to you as an end-user, application developer, or IT manager who may be considering where to store data? Cloud-based storage today can let you store and manipulate any type of data on higher-performance, more scalable, more accessible, and cheaper storage. Moreover, it can free you from the costly management overhead that surrounds data storage by serving up file storage in a self-managed, easy-to-access manner. Cloud-based storage lets users not only provision and manage storage themselves, but also store data in XML files, text files, or many other data formats. Meanwhile, cloud-based storage solution provide users with database-like data manipulation through innovative file-filtering mechanisms, metadata tagging, and the virtual presentation of files in many places at once.
These solutions are available today. Some users have moved entire application sets to API-accessible storage services such as Amazon S3, and in turn, have access to a dynamically scaling infrastructure. Some businesses are currently shopping for, or building, similar solutions within their own corporate networks, to harness the flexibility of such an infrastructure while reducing their storage infrastructure cost of ownership by enabling user self-service.
But for many other users, cloud-based storage is a fuzzy new technology, with neither clearly defined capabilities nor benefits.
In this second part of our series, we'll look at a set of five core capabilities that are important to end users looking to store data in either private clouds or the public Internet cloud. These capabilities and the associated benefits will shed light on how cloud-based storage may be beneficial to you.
Cloud-based storage is amorphous today, with neither a clearly defined set of capabilities nor any single architecture. Choices abound, with many traditional hosted or managed service providers (MSP) offering block or file storage, usually alongside traditional remote access protocols or virtual or physical server hosting. Other solutions have emerged, typified by the Amazon S3 service, that resembles flat databases designed to store large objects.
The Taneja Group defines cloud-based storage as a specific category within the larger field of "storage in the cloud" solutions. Storage in the cloud encompasses traditional hosted storage, including offerings accessed by FTP, WebDAV, NFS/CIFS, or block protocols either remotely or from within a hosted environment. Cloud-based storage is an evolution of this hosted storage technology that wraps more sophisticated APIs, namespaces, file or data location virtualization, and management tools, around storage.
Regardless of whether you are building a multi-tenancy hosted application, or you want to move your enterprise applications to the cloud, there is a core set of capabilities that are common to emerging cloud-based storage solutions, and how well a specific solution delivers on these capabilities will be key to determining how well you can 1) integrate stored data with different applications and systems in versatile ways; 2) harness cloud-based storage performance, scalability, and distribution to increase your infrastructure flexibility, responsiveness, and availability; and 3) reduce your cost of owning and managing storage.
API-accessible. Today, businesses are surrounded by a world of Web services, scripting, lightweight development frameworks, mash-ups, and various other dynamic, easily integrated Web technologies. Access to stored data through a sophisticated API makes cloud-based storage extremely versatile, and in fact re-invents how stored data can be leveraged for the support of applications and business processes. Moreover, APIs can be tuned for general storage management as well, and allow administrators to overlay nearly any management, reporting, or governance process on storage. A few potential usage cases for API-accessible storage include
--APIs will let administrators wrap storage management with nearly any business process, including customized automation of provisioning, snapshots, file versioning and rollback, replication, and more. Web services APIs may be easily discovered and integrated so that they remove the hurdles associated with management protocols of the past. Because Web services are self-documenting and discoverable, management capabilities can be exposed, even with different APIs for different systems, without creating lengthy standardization efforts such as SMI-S;
--APIs will let developers create, store, access, and re-use complex sets of data more easily. This will encourage lighter-weight, more flexible application architectures, easier data re-use, and rapid application development, at lower cost and effort. Think of API-accessible data as a gateway for Web application access to any unstructured data in the enterprise, with the simple efficiency of databases;
--APIs will also allow administrators and/or developers to empower user self-service by creating portals or applications where users can manage, protect, and control their own storage. This will drive down the cost of ownership for storage;
--APIs will make storage extensible, and data more portable. For example, an open API could be used to create a gateway that mimics yesterday's protocols or interfaces with today's protocols such as XAM; and
--Storage vendors are developing APIs on top of flexible infrastructures that will make Web-service-based access to storage commonplace. Innovators will make their storage even more extensible through the use of APIs, which will enable integration with other applications for data tiering, classification, control, conversion, or other file manipulation. Some examples include Ibrix's Cirrus API, which provides access to user management, data sharing, snapshots, and versioning, and Omneon's Media Services Framework, which provides API access to video transcoding and QoS-like storage optimization.
Innovative in organization and management. Cloud-based storage cannot grow to the scale necessary without flexible management, organization, and presentation of storage that removes cumbersome semantics such as hostnames, directories, and permissions. When users turn to cloud-based storage, they will recognize enormous savings in the time and effort associated with administration of storage and data management. And developers can store and integrate data faster and with less administrative overhead.
Cloud-based storage providers will enable self-service storage provisioning and management of data that is not only API-enabled, but also breaks with current conventions. Innovative providers will not only cover basic storage management operations (file protection, tiering), but also provide data presentation that can mimic some of the capabilities of file virtualization through virtual views or containers for data that are completely abstracted from the on-disk location of data. Users will be able to place data into different virtual views that are accessible by different users. Such organization through virtual views and lightweight tags, coupled with self-service management of storage, may change how the industry approaches traditional file storage as well.
Responsive and scalable. Users of cloud-based storage should assess the responsiveness, availability, and scalability of their hosting service. Vendor innovation will drive new levels of these features that will surpass even the best enterprise systems. Users should minimize their risks through SLAs focused on performance, responsiveness and scalability, but also through an awareness of their service provider's storage capabilities. While visibility into provider capabilities will likely always be opaque, cloud-based storage should demonstrate the ability to transparently move data across locations and potentially service providers, self-heal, and scale up in performance and capacity to meet rapidly changing customer demands. Equally important, users should match service provider capabilities in these areas with current and anticipated future needs, and do so while being attentive to their planned application architecture. Users with many small, separate I/O streams may be able to easily distribute their demands and work with any, or many, provider(s), regardless of the provider's ability to move or distribute data.
Open, well-documented, portable. Today, cloud-based storage is too new for standardization, and recent attempts at standardization have been slow or have left a sour taste in the mouths of many users, both within the storage industry and across the IT field in general (XAM, SMI-S, OpenXML, and others). This leads ambivalent uses to anticipate wading through a bog of APIs with excessive overhead and incompatibilities, with no hope of moving data between systems without starting from scratch. We believe concerns about standardization and portability for cloud-based storage are largely unwarranted, and that cloud-based storage will in fact remedy many of the standardization issues we have today. That is because cloud-based storage is centered on lightweight APIs and access frameworks such as HTTP-based REST that are already well established. But users should pay attention both to what these frameworks give them, and whether the frameworks are served up on top of the right underlying storage.
In our view, users of cloud-based storage will be best served by innovative storage vendors who develop deeply integrated, full-featured APIs on top of their next-generation storage systems. This creates a turnkey system that can deliver advanced storage features, such as snapshots or file versioning, while assuring both the service provider and the customer that the solution will work without incompatibilities or multi-vendor finger-pointing. Providers that turn to these solutions and APIs will be able to deploy cloud-based storage services quickly and cost-effectively.
More importantly, while we believe the lightweight and simplified nature of REST-based APIs will make application and data porting simple, out-of-the-box solutions will drive standardization. Since cloud-based storage will only be possible on top of a relatively small number of systems that can scale to huge amounts of performance and capacity, there will be a relatively small number of solution vendors and APIs in the market. Since storage Web services APIs will support similar basic operations (even if they also support more advanced operations), developers can quickly map APIs between solutions to mask differences and enable better portability.
Ready for versatile usage cases. Flexible presentation of storage as traditional file/block, remote storage (ftp, http, WebDAV), or API-accessible storage will open the doors to versatile use cases for cloud-based storage. It was not long after Amazon S3 sprung up that users were trying, and demanding, hosting of entire virtual machine computing environments. This enabled more uses for Amazon's storage cloud, and enabled users to collocate complex compute resources alongside rich Web-integrated data. Today, users are able to perform complex content creation and/or business logic processing while simultaneously using generated data within a loosely coupled and widely distributed Web application architecture. Many users will find value in combining multiple computing approaches when cloud-based storage can be accessed as traditional file/block storage in a hosted infrastructure.
This will be an ideal use case for the next generation of cloud-based storage. It is easy to imagine large hosting providers with unique speed and data differentiation that could take advantage of a Web services overlay for their data. As one example, a solution at a service provider like XASAX -- a provider that is collocating HPC-like infrastructure and financial applications next to high-speed financial data feeds for market data analysis --may enable a new generation of dynamic Web-based data reporting/analysis and mash-up applications for financial customers.
Challenges for cloud-based storage
Users initially consider cloud-based storage for its potential cost savings and improvements in storage scalability and availability. While such savings are compelling, users shouldn't overlook holding up the fundamental capabilities of cloud-based storage to a measuring stick that considers future strategic IT and business needs. The fundamental capabilities of cloud-based storage take center stage when considering strategic business needs, and will differentiate providers.
In the next article in this series, we'll look at cloud-based storage capabilities that are key considerations for service providers.
Jeff Boles is a senior analyst and director of validation services at the Taneja Group research and consulting firm.
Challenges for cloud-based storage
Users should be aware of the potential downsides to cloud-based storage. These include issues of portability or vendor lock-in, regulatory compliance issues, and the availability of cloud-based storage when one vendor's solution is unique in architecture or APIs.
First, there are currently no standards for cloud-based storage or computing. This can make porting an infrastructure from one vendor to another dicey at best, and may mean you're subject to the whims of an infrastructure provider. This is a key issue that emerging cloud-based storage solutions will address, but nonetheless is a major challenge today. Once solutions are available from major vendors, more services with common APIs will become available, and developers will come up with mappings between other popular APIs (such as Amazon S3, and potentially even XAM).
Second, cloud-based storage solutions still fall short of meeting all IT storage needs. The biggest gap is where databases are concerned. While Amazon S3 started life looking very much like a widely distributed, extremely flat database, it has never been capable of meeting traditional enterprise database needs: It is not relational in the traditional sense, it lacks DBMS tools, and because it is designed to support loosely coupled applications, it does not support high loads of guaranteed, consistent transactions expected in traditional database environments. More importantly, without a distributable database, the cloud looks like a poor place for databases -- applications that depend on access to single instances of databases in the cloud will never be able to benefit from load-balancing, scalability, and improved availability; all of which may imply the use of multiple copies of data or stateless redirection of data connections. This is an area of critical importance in which next-generation cloud-based storage vendors must begin to innovate.
Meanwhile, legacy architectures or users relying on traditional databases face a quandary and must decide whether to re-architect their application for the cloud and/or deal with sticky issues around transactional consistency and other features that are often tied to concrete business requirements, or choose another path.
Third, users need to be sensitive to where they store their data in the face of ever-changing regulations (SOX, HIPAA, PCI, etc.). Many regulations dictate that users will be able to identify and control the location of their data, which isn't feasible when that data is virtualized across a cloud.
Finally, because of availability risks in a shared environment that is not fully under their control, users often remain hesitant to turn to cloud-based storage for anything more important than testing and development, or infrequently accessed data storage. Until cloud-based storage is highly available, users will be unable to host mission-critical data on it. Increased availability will come on two fronts: 1) the entry of enterprise-class storage systems into the cloud storage market; and 2) ubiquitous and compatible multi-provider solutions that deliver availability through data dispersal and flexible delta-based replication. Delivering more availability in the cloud may be largely a matter of dispersal -- spreading partial or complete copies of data across the cloud can keep it available even during dramatic failures. Vendors such as Cleversafe have come up with innovative and unique algorithms for this, and other technologies exist that have not yet come to market.
But this requires widespread cloud-based storage, which will come from service providers ramping up cloud services built on top of out-of-the-box offerings -- foregoing the intensive and drawn-out development cycles that created today's cloud-based storage offerings. Selecting a vendor that is using a commonly available out-of-the-box platform, rather than a custom-developed one, may put you in a position to make use of multiple service providers for increased availability sooner rather than later.