Cloud-based storage, part 3

The final installment in our series on cloud storage focuses on capabilities for service providers and the data center.

By Jeff Boles

-- As we have discussed in parts one and two of this three-part series, at the heart of cloud-based computing is a loosely coupled infrastructure that is self-healing, geographically dispersed, designed for user self-service, and instantaneously scalable in response to the ebb and flow of business demands. Cloud-based computing virtualizes the location, connectivity, and resources behind loosely coupled application components in order to be elastic--able to move and shift computing and storage resources, and rapidly deploy new systems or applications, in response to any demand. Moreover, cloud computing promises to make infrastructure, applications, and storage easier to manage, and much easier to integrate with other applications or changing business processes.

To recap, Taneja Group considers cloud-based storage (CBS) an emerging technology within a larger solution category of file-centric "storage in the cloud." Storage in the cloud has previously included remotely accessible file storage offerings accessible by way of FTP, WebDAV, or NFS/CIFS. Cloud-based storage is an evolution of hosted file storage technology that wraps sophisticated APIs, new data presentation and access semantics, location virtualization, and management tools around file storage. While file storage may be used to support block-like storage through virtual server images, CBS is about serving up data stored in files across the Internet or internal enterprise networks.

Strategic asset for service providers
An emerging set of out-of-the-box CBS solutions will be a strategic and critical consideration for managed service providers. CBS solutions today include EMC's Atmos, Ibrix's Cirrus, and Nirvanix, but we expect that every major vendor, and many scale-out NAS vendors, will soon come to market with their own take on CBS.

CBS can serve as a foundation for building more sophisticated and "stickier" Web applications around existing or new applications and upper layer storage services, including basic but unique data storage, backup, file sharing, collaboration applications, more sophisticated Web hosting and Website support, VoIP application services, data archiving and discovery, contact management, customer relationship management, and others. These "stickier" services can increase customer retention and create significant new revenue streams. In a competitive landscape where customer mindshare is increasingly dominated by services from Amazon, Microsoft (Azure), and Google, stickier competitive services are necessary business weapons.

Simultaneously, large enterprises may consider a role as a service provider to their own organization. For the large enterprise, cloud-based storage hosted internally may reduce administrative complexity and cost of ownership for storage by enabling user self-service and providing a foundation for pay-for-use utility storage. The potential pay-for-use aspect of CBS can be compelling in chargeback driven organizations solely because accurately allocating storage utilization is so difficult in traditional infrastructures. CBS, designed to support incredibly large numbers of users, scales both up and down and can provide the granularity necessary to support pay-for-use. Moreover, cloud-based storage in the enterprise may heighten collaboration and enable more rapid, lightweight application development. It is easy to imagine Web portals that are suddenly enabled with rich file and data access, even to the point of hosting GoogleApp-like solutions within the enterprise.

Whether it is to take advantage of the tremendous opportunities surrounding storage and application services, or respond to repeated demands by users and customers, it is clear in our conversations with service providers--both external and internal within some enterprises--that they are struggling with how they can implement cloud-based storage as a part of their service portfolio. Building a CBS infrastructure has to date been fraught with difficulty and complexity. Most mature solutions today have been built on top of white box infrastructures with millions of dollars of custom development. The development and management requirements for these infrastructures would overwhelm even sizable service providers or enterprises.

In turn, many organizations have evaluated the re-branding or resale of existing cloud-based storage such as Amazon's S3. While engaging a third party's storage infrastructure is appealing, service providers are often averse to the risks associated with depending on cloud-based services over which they have little or no control.

Cloud-based storage has evolved from continuing attempts to de-couple storage from applications so that each resource can be optimally scaled and managed.

The good news is that this is changing. We see an emerging next generation of storage solution that will provide service providers with out-of-the-box storage in the cloud. Let's take a look at what that out-of-the-box solution will look like, and its fundamental capabilities that will let service providers easily build higher-level storage and computing services, while providing cloud-based storage to customers. While service providers are tied to meeting end-user requirements for storage services, they are necessarily focused on a different set of core capabilities that will allow them to economically manage and grow a storage infrastructure across many widespread users.

CBS capabilities for MSPs
Managed service providers (MSPs) face dueling challenges in building a CBS infrastructure: A CBS service must serve the needs of the customer and have a unique set of capabilities that will allow the MSP to manage massive amounts of storage and users. Moreover, MSPs providing storage today have already managed storage long enough to realize many of the shortcomings of traditional monolithic NAS. Consequently, CBS must also reduce, mitigate, or eliminate the current issues with file storage infrastructures (including costly NAS sprawl), isolated storage silos that must be managed separately, and costly, disruptive NAS migrations and service events.

There are no hard and fast rules for what will make an ideal CBS storage platform for a given MSP. The MSP storage market is still evolving, and each MSP will have a unique combination of services on its road map, resulting in variation in their specific requirements.

Today, there is demand for backup, archiving, Web application data storage, virtual infrastructure hosting, e-discovery, and a number of other storage services in the cloud. While these services are often SMB-oriented today, we expect they'll rapidly take on other customers, including the vast potential market of individual users, as soon as MSP solutions can handle the scale and complexity associated with enormous numbers of customers. In addition, the near future will bring more enterprise application hosting, distributed parallel processing, much more comprehensive virtual infrastructure hosting (in the form of entire virtual environments or virtual private data centers, as well as virtual desktop services), more sophisticated collaboration solutions, and more.

CBS can be a key enabler of every one of these potential services. Using this range of services, and an open mind toward potential future hosted services, we've identified a set of core CBS capabilities that merit special attention by service providers evaluating CBS solutions. MSPs should consider how differentiation in each of these areas may support their plans for services and make their infrastructures more flexible in the future. Moreover, while this article is targeted toward MSPs, enterprise users considering internal service provider models for storage should take these capabilities into consideration as well.

Established, rich APIs. API enablement is at least as important to service providers as it is to end users. First, end users expect API access in true cloud-based storage and will select a provider based on the capabilities of an API. Second, API-enabled storage infrastructure can enable service providers to wrap customized storage management and business processes around their storage services and optimize their management practices. Given the enormous capacities and number of users that will be served by a CBS solution, APIs will be critical to infrastructure management. Similarly, an API for fundamental storage tasks may be the only way to build self-service portals for user creation and management of storage spaces--a key requirement for serving up storage services over the Web.

Moreover, APIs can serve as the foundation for other higher-level storage service offerings. APIs can help service providers develop unique applications or service offerings, or meet specialized needs of key customers. MSPs should carefully evaluate the depth and versatility of an API in combination with their anticipated service offerings, and evaluate whether it exposes the right storage capabilities, in the right way.

Storage management and organizational tools for Web-based storage. Service providers face a significant management hurdle when they're operating cloud-based storage. Traditional approaches to storage management--provisioning, optimization, reporting, and control--will not hold for cloud-based storage as it is too time-, effort-, and cost-intensive when scaled for enormous amounts of capacity and users. Beyond API enablement, Web-scale storage will need a new storage management paradigm. This new approach to management will have several fundamental capabilities.

First, cloud-scale storage management will be user-centric. Users will manage their own allocation, provisioning, protection, and other operations within strongly partitioned areas of the storage infrastructure. Nonetheless, strong partitioning should not retard the MSPs' ability to holistically view and manage the storage infrastructure. Even with end users made responsible for most basic storage operations, the MSP will still require either more sophisticated reporting and utilization tools than ever before, or an API that can easily enable custom development of tools for reporting, planning, accounting, and optimization.

Second, CBS will need to bridge today's sprawling NAS silos and aggregate even geographically dispersed storage systems behind huge namespaces that can effectively virtualize file locations and provide multiple views of data. With this evolution of file virtualization, CBS solutions will not only abstract data location, but will also be able to present storage for pure file storage, present the same storage for virtual environments, control different levels of visibility for different users and applications, and even meet MSP reporting requirements. Moreover, this combination of file virtualization and global namespaces is only a beginning: CBS may extend file virtualization across any system by redirecting requests and API commands to heterogeneous storage systems at other service providers, or even within a customer's own data center. By virtualizing other data behind a cloud and a single API, customers may be able to ingest foreign data into a CBS solution with ease and extend CBS-based storage services to local data or data hosted across heterogeneous storage clouds.

Third, behind a large, partitioned, but flexibly presentable CBS solution, the MSP should look for data replication and movement capabilities. This will allow the MSP to manage storage seamlessly across locations while making service events, migrations, or other outages nearly transparent. Moreover, replication can be a key component of CBS infrastructures built for high availability, and encourage customers to host more mission-critical applications in the cloud.

Storage systems that deliver simplified management, extensible APIs, and abstract the location of data will radically change file storage for the service provider by easing management, simplifying high availability, and making migrations and maintenance activities transparent to end users.

User management and access control for cloud-based storage. While users expect flexible organizational tools for Web-based storage, service providers looking to build out cloud-based storage infrastructures will face challenges of providing storage for multiple types of customers and usage cases.

We see customer usage cases for Web-based storage varying between two extremes. On one end is bucket-like storage where individuals simply store data for access. At the other extreme are customers who will want to store complex data that may be shared in complex ways across large numbers of users, multiple groups of users, and/or between organizations. These customers will want granular control and easy configuration of a CBS solution to support these complex configurations. The MSP will require a framework for managing users and providing data visibility and access control that can support the most complex organizations and inter-organization relationships. Traditional user directories and file and directory access controls simply are not flexible enough for this task. Validated architectures such as extensible metadata tags at the file level or database-like file system overlays may be the only current paths to meeting these data presentation requirements. However, more innovation is sure to come. Both of these approaches are used in and validated by file virtualization and information classification and management (ICM) vendors today. Such technologies will allow organizations to structure multiple presentations of the same data to filter and control content for different types of users and serve as a foundation for enabling user-based access control and sharing of data through simple metadata tagging.

Altogether, MSPs should carefully consider whether the depth and flexibility of user partitioning, namespace, quota, and sharing features can support their potential service offerings and customers today and in the future.

Scalable storage. While the abstractions of CBS--including namespaces, APIs, and management tools--could possibly be layered over many different types of storage systems, the storage system requirements behind CBS are nonetheless unique. Cloud-based storage will impose new demands for scalability, performance, and ease of management on a scale that has not been seen before. Moreover, there are few markets as cost-sensitive as MSPs. Service providers should look for a cloud-based storage infrastructure that will scale indefinitely and linearly in both performance and capacity with extreme cost efficiency, be easy to manage at any scale, deliver high availability, and support wide geographic dispersal with sophisticated replication and data movement tools. A number of block and file storage systems exist today that can meet these requirements.

Extensibility. While APIs can provide a foundation for extending CBS service offers with upper layer services, in the context of how an MSP plans to build out a services portfolio, extensibility merits considerable evaluation. While we expect APIs to provide a foundation for accessing core storage functions, we also believe two other areas of extensibility will emerge. First, sophisticated APIs will allow a CBS storage infrastructure to trigger and interface with other Web services. In the first generation of solutions, this capability will create a plug-in-like architecture that will support the delivery of other services, such as proactive data classification as data is ingested. Second, MSPs should carefully evaluate how much more capability a CBS solution brings to the table beyond basic storage operations. Innovative features will build a foundation for data ingest, across-the-cloud data interaction, secure data locking, file versioning, and other capabilities.

Enormous change is on the horizon around how we work with stored data, and with emerging CBS solutions MSPs will provide new approaches to storage management, data presentation, and flexible storage architectures. Moreover, while service providers will be at the forefront of adopting these technologies, these changes will eventually influence the core practices of data storage across the entire industry, from small businesses to large enterprises. Driven by MSP requirements, the changes surrounding cloud-based storage will alter infrastructure systems drastically, as vendors attempt to look at data storage with a new perspective to address the core requirements.

While the number of vendors currently in this market is small--including EMC, Ibrix, and Nirvanix--other offerings will be available from a number of other vendors in the near future.

The message to the service provider is clear: You must have cloud-based technologies on your strategic radar to remain competitive. Whether it is a core technology or a technology that competes with your core technologies, the likelihood that CBS will influence your business is high. If you are an MSP, we recommend you keep an eye on the emerging vendors, track the rapidly changing capabilities of their solutions, and fully consider how they can be integrated into your services to drive new revenue, increase cost efficiency, and enhance the competitiveness of your business.

Jeff Boles is a senior analyst and director of validation services at the Taneja Group research and consulting firm.

Policies and extensibility in the cloud 

Sophisticated and flexible optimization of files stored within a cloud-based storage (CBS) infrastructure will be a fundamental requirement, and in many solutions will be delivered by a policy engine interacting with a CBS solution's API.

Take for example a video stream that starts with obscurity then becomes very popular. To deliver the streaming performance required for that video stream, a CBS infrastructure may need to create multiple copies of the video file and even geographically distribute it. If spare capacity and bandwidth exists, perhaps a CBS solution should recognize every video file and cache multiple copies ahead of time, to be used in case popularity increases.

As a non-video example, a service provider may have different costs or classes of service, varying in performance, protection, or availability. A customer may wish to have their data automatically tiered across different storage classes based on file age, utilization, or file type.

But there is debate about how the policy engine should interact, and even whether that policy engine should reside outside of or within a cloud-based storage solution. We see three different types of solutions currently:
--Some solutions are coming to market where a sophisticated policy engine is built into and distributed throughout the solution;
--Some solutions are coming to market where there are efficient hooks within the API or even a different API for handing off data to third-party solutions that can perform an operation on data (such as classification based on content), and return instructions to the CBS solution to take action on the original file (such as moving the file to a different tier or locking the file against future change); and
--Some solutions will provide no more than a Web services API for access, and all policies will need to be supported in a web application business rules layer that interacts with the API.

Policy-based management of data will vary in importance depending on the presence of classes of service within a cloud, the size and distribution of storage resources behind a cloud, and the need to deliver lifecycle management or other complex services against customer data.

The key questions to consider are the following:
--Do the key data interactions meet your needs around the services you plan to offer--e.g., can it take action on video files, perform file and owner characteristic-based tiering, or perform other functions you require?
--Is the architecture scalable enough to service the number of files and users you anticipate? There may be significant performance differences based on where the policy engine is, and how it interacts with data. Moreover, depending on how it interacts with data, a heavily used policy engine may cut into total system performance.
--Can policies be applied with the granularity you require--e.g., if you are building a hosted, highly automated litigation support service on top of CBS, you may have tens of thousands of users that each demand complex policies with hundreds of rules. Can the CBS architecture support isolated but complex rule sets?
--Finally, are there any unique twists in your architecture that you require a policy engine to interact with, and can you customize the solution's policy engine to do so? Today, CBS is in its infancy. But in the near future, it is possible to imagine an application delivery network to cache or distribute content that is in high demand.

Depending upon your response to these questions, any one of the architectures may be the better fit.

This article was originally published on November 11, 2008