Several years ago, Taneja Group predicted the inevitable emergence of what we called “cloud-based storage.” Today, this technology is behind most cloud offerings for unstructured data, whether public or private. We defined cloud-based storage as the highly scalable, RESTful API accessible, object-based technology that is no longer just an Amazon S3 offering, but is served up by all manner of product vendors and providers.
Adoption for these technologies has taken off in a number of different verticals and use cases. Currently, we see an increasing amount of diversity among solutions in the market, especially when it comes to vendors offering products for customers to build their own clouds for unstructured data. The vendors of such products promise to give customers unprecedented flexibility and power in storing and accessing data, and in scaling the storage system for that data. But the reality is that truly delivering on the promises of cloud, especially when it comes to scale, requires a unique architecture that is focused on the ideals and complexity of data “distribution.”
In moving into the age of storage in the cloud, we believe the practitioner should become intimately familiar with this term—distribution—and in an entirely new context versus age-old distributed computing.
Distribution is so important, we’ve taken to recognizing a particular type of object storage as “Distributed Object Storage.” That means it is designed for distribution and will in turn unlock unique efficiencies and capabilities when distributed across locations and grown to high levels of scale. Since those are fundamental long-term goals driving customer cloud initiatives, understanding distribution—and the architecture for achieving it—should guide product and architectural decisions for every cloud data storage initiative.
Object Storage—an Architectural Shift
In a nutshell, object storage was first envisioned a number of years ago to tackle the complexities in programmatically storing and accessing data. For many application models, traditional file system semantics simply created enormous overhead for accessing data. Many hundreds of lines of code and interactions could be required to negotiate file system attachment, security, file location and name resolution. When locating files and data many times over, especially amidst thousands or millions of potentially unrelated files stored in the same location, these interactions excessively encumbered the use of data in applications.
Object storage, prior to the popularity of cloud, came about to provide more versatile access. Using a single unique identifier per object, objects in an object storage system could be stored, retrieved and manipulated without concern over file system name spaces and semantics.
The idea of cloud computing has taken shape over the past several years in answer to demands for further abstraction and automation so that IT can operate with greater efficiency and at greater scale. Object storage has accomplished this in several ways:
- Scale—object storage, in dispensing with much of the complexity of traditional file systems, is inherently more scalable. It allows vendors to design systems that can easily store millions or billions of data objects with very low complexity and little management overhead. Moreover, data can be easily cast across multiple storage repositories with little complexity since storage and access is performed and resolved by object ID. Single administrators are capable of managing petabytes of space.
- Flexibility in protection and data management—treating pieces of data as standalone digital objects has allowed vendors to develop innovative redundancy, protection and management tools that can be applied to objects and arbitrary groups of objects. This allows users to meet a wide variety of service levels from a single object pool. It is also much more versatile and cost-effective than RAID or other technologies that must be applied to entire file systems/volumes.
- Accessibility—the simplicity of object addressability is inherently compatible and extensible for use with access protocols like HTTP. It has been built into SOAP and RESTful implementations that have enabled very lightweight and easy-to-develop programmatic interactions. Today, object storage supports all manner of applications across all types of business and consumer devices.
Storage has gradually and continuously evolved over the past several decades in the interest of enabling better accessibility to larger and more varied types of data. Object based storage, commonly referred to as cloud-based storage, has become the latest incarnation of this evolution.
Today, object storage with these characteristics is enabling distributed, more easily accessible, and highly scaled storage systems that have fulfilled critical demands for large active archives, medical image systems, digital media sites, content depots and much more.
Figure 1: Object-storage is an on-going advancement in access to data, bringing further simplification to the way we interact with data.
Object Doesn’t Always Go The Distance
But as object storage has become the prevalent technology behind cloud and high-scale unstructured storage infrastructures, differences among products on the market have rapidly emerged. While nearly every object vendor positions their storage as a technology for high scale and multiple locations, architectures vary greatly in how well they enable both without sacrificing efficiency.
In reality, limited sophistication in features that manage the “distribution” of data means that many object storage systems are better in localized private clouds than in the globally dispersed, high scale usages that many customers initially pursue. Such localized architectures are inherently limiting in the age of the cloud. In a storage system where an object is the unit of storage, this is an artificially imposed constraint and may mean that customers run into physical limits on scaling long before they’ve realized the full value of their cloud initiative. Worse yet, some customers are bit by such limitations only when eventually growing from an initial deployment into larger deployments and multiple sites.
Superior architectures can deliver superior resiliency, while also enabling other key high-scale services, such as security and response optimization.