Object-based storage manages stored data as – you guessed it -- objects. Each object includes data, extended metadata, and a unique identifier. The unique object IDs enable a global namespace that accesses data across all federated locations, and provides a robust compliance environment. Detailed metadata supports granular data management activities like deep indexing, identifying objects for deletion, or managing automatic storage tiering.
Object storage also goes a long way towards eliminating traditional RAID. Object storage was created with traditional RAID replacement in mind. RAID has trouble scaling to suit multi-TB environments, and popular RAID 5 and 6 can affect performance during array rebuilds. Object storage instead mirrors disk contents to one or more nodes; in a failure the environment redirects requests to active nodes containing the mirrored copies. (Granted that RAID 1 is also mirroring, but most object storage systems take less overhead to mirror. In EMC Atmos’ case for example, mirroring takes 33% of traditional RAID 1 overhead.)
Today’s Top Use Cases
Object based storage has always been defined by specific use cases, traditionally active archiving. Archiving is still a very active use case but additional use cases are growing fast.
Archiving was object storage’s primary use case for many years. Active archives depend on a global namespace that allows distributed access to content. And since objects are immutable, active archives are compliant and can prove data integrity for eDiscovery and governance processes.
Although active archiving is not object storage’s fastest-growing use case anymore, it remains a major purchasing driver for object storage. EMC Centera is an actively selling product for object-based archives. EMC positions Centera’s WORM capability as content-addressable storage (CAS) for eDiscovery and compliance. HP StoreAll 8800 is an object store targeted directly at very large active archives. The store, which also has file storage capabilities, scales to 16PB of capacity and billions of objects.
Large cloud storage is overwhelmingly object based. Cloud storage giants with object-based stores include Amazon S3, Rackspace Cloud Files, Microsoft Azure, and Google Cloud Storage. These are big installations: as of April 2014 Azure alone was storing more than 20 trillion objects. (Yes, a trillion.) And object storage is not at all limited to these big cloud providers: many application and storage vendors use object-based storage to create other private, public, or hybrid clouds.
Hitachi Content Platform (HCP) stores object metadata on site while supporting data movement to public clouds of the users’ choice, while Atmos is EMC’s flagship object environment for the cloud. Massive online applications Facebook and Spotify use object stores: Facebook uses Haystack to store huge collections of user photos, and Spotify stores its song collection on AWS.
Massive Content Repositories/Big Data
Whether in the cloud or off, massive scalability is crucial for storing active large data collections. Scality Ring software supports enterprise clouds, content distribution, and active archives. EMC also offers object storage to this usage case with Atmos and ViPR, which also supports Hadoop’s file system and block-based storage.
IBM SoftLayer is an acquired object storage service that is optimized for big data. IBM Elastic Storage software virtualizes file and object storage into a single addressable repository for big data analytics; Elastic Storage atop OpenStack Swift creates IBM’s Elastic Storage Object.
In New York, Albert Einstein College of Medicine of Yeshiva University uses object-based storage to feed raw data from the research environment to a massive content repository. The vendor is DataDirect’s Web Object Scaler (WOS), which enables massive scalability at a much cheaper price than sending the same raw data to a block-based SAN.
Quantum Lattus Object Storage extends Quantum StorNext primary storage as a massively scalable online storage tier. StorNext management policies extend to the Lattus tier, which sports a native cloud interface. NetApp acquired Bycast object storage maker some years ago, which became the foundation for NetApp StorageGRID. The StorageGRID virtual appliance works with Amazon S3 to enable users to store and distribute billions of objects for global data center access.
Object Store Challenges?
We have painted a rosy picture of object storage. But for all its benefits, object storage has its own set of challenges around availability, protocols and performance. Users need to carry out due diligence when purchasing an object store product or service.
· Availability. Object store mirroring and replication builds in data redundancy in the object store. However, availability also includes object reliability, integrity, and security that object storage users may not be aware of. Erasure coding and data dispersion are two common additions that increase system reliability. Erasure coding parses objects into blocks, which are then expanded into a larger set of blocks.
A minimum number of blocks is necessary to recreate the original object, which means that bad-acting users would have to access multiple block segments to recreate the object. Data dispersion can distribute objects across multiple nodes for greater security, and protects objects against unauthorized access while authorized rebuilds happen quickly. Cleversafe dsNet is an object store that stores and tracks object slices in different nodes. Now add encryption to this availability tool base for a highly secure object store.
· Protocols. Applications communicate with object storage via APIs. Proprietary APIs were common in older object storage, and application developers were understandably resistant to programming many of them. Although there is still no standardized protocol – and will probably never be -- newer developments build on the ubiquitous RESTful API, which is universal to any application or client using HTTP/HTTPS. Popular Amazon S3 and OpenStack Swift APIs are both based on REST, as are MS Azure APIs.
· Performance. Low latency, high throughput and high IOPs are earmarks of fast storage performance. Until recently, most object storage systems lacked the high performance trifecta: active archives did not need it and it was expensive to engineer into the object store. However, big data with active analytics and streaming cloud storage benefit greatly from performance improvements. Today’s development centers on SSD tiers and high performance storage adapters. For example, Beijing-based TEAMSUN recently worked with Intel to improve object storage performance on OpenStack Swift using 10Gb Intel Ethernet Converged Network Adapters and Intel SSDs.
Object storage is only going to grow from here. Its PB-scalability alone is critical to cloud providers, and built-in redundancy is a big improvement over third-party data protection cost and complexity. And while there may never be a single standard protocol, the wide adoption of APIs from Amazon S3 and OpenStack Swift have improved access across many different applications. Performance will also continue to improve with SSD tiers to boost object storage performance.
Photo courtesy of Shutterstock.