by Mike Matchett, Sr. Analyst and Consultant
Object storage has long been pigeon-holed as a necessary overhead expense for long-term archive storage—a data purgatory one step before tape or deletion. We have seen many IT shops view object storage as something exotic they have to implement to meet government regulations rather than as a competitive strategic asset that can help their businesses make money.
Normally, when companies invest in high-end IT assets like enterprise-class storage, they hope to recoup those investments in big ways. For example, they might accelerate the performance of market competitive applications or efficiently consolidate data centers. Maybe they are even starting to analyze big data to find better ways to run the business.
These kinds of “money-making” initiatives have been mainly associated with file and block types of storage—the primary storage commonly used to power databases, host office productivity applications, and build pools of shared resources for virtualization projects.
But that’s about to change.
If you’ve intentionally dismissed or just overlooked object storage, it is time to take deeper look. Today’s object storage provides brilliant capabilities for enhancing productivity, creating global platforms and developing new revenue streams.
Object storage has been evolving from its historical second-tier, data-dumping ground into a value-building primary storage platform for content and collaboration. And the latest high performance cloud storage solutions could transform the whole nature of enterprise data storage. To really exploit this new generation of object storage, it is important to understand not only what it is and how it has evolved, but to start thinking about how to harness its emerging capabilities in building new business.
Object storage is designed to take an arbitrary chunk of data—a file, an image, a stream of encoded Mars rover sensor data—and efficiently store it as a single object on disk.
Unlike file systems, there is no navigable tree or directory to maintain. Instead, as it’s stored, each object receives a unique ID which is drawn from a huge “flat” (often 128-bit) address space. That means a very large number of objects can theoretically be stored while avoiding file system limitations and overhead.
Object stores also maintain metadata about each object. In archiving scenarios, metadata is used to facilitate and enable life cycle data management (e.g., “compress this object after three months; delete it after three years").
The first widely produced class of object storage, called Content Addressable Storage (CAS), creates object IDs by applying a hashing function on the data being stored. If the resulting ID already exists in the system, then that data has already been stored, leading to built-in object level de-duplication. When a CAS object is retrieved, it is hashed again to check against the ID, proving that the object is what has been requested and hasn’t been modified in the meantime.
CAS systems like EMC’s Centera and Dell’s DX (based on Caringo CAStor) are ideal for archive compliance scenarios.
Transforming Archives into Content
Traditional CAS-type object storage has largely been deployed by organizations as a defensive measure. Object storage is a great way to archive static data to ensure regulatory compliance. That's because it can be easily leveraged to enforce arbitrary access, retention and deletion policies over scalable architectures capable of storing millions of objects.
But at the end of the day, if all object storage delivers is a better archive, it is not helping organizations enhance revenue streams, much less create new ones.
We recommend instead that IT should be striving to contribute at every turn to business productivity. Object storage can be used as a transformative service to add significant value to an organization.
Because object storage supports the online recall of large archives with near linear performance at scale, passive archive processes can support new active use cases that take advantage of ever greater amounts of historical data. For example, consider a solution like RainStor deployed over CAS object storage. It enables archiving older data out of production databases while maintaining full online SQL query access to the archived records. As a result, it boosts the performance of production databases while supporting mining additional value out of historical data. This is a good example of how object-based archives can contribute significant tangible business value.
With the growing popularity of cloud storage and collaboration solutions, great opportunities now exist for taking object storage on the offensive. Object storage has become a great platform (either in-house or through IaaS) on which to deliver the private dropbox that corporate denizens are demanding. It can boost collaboration, whileproviding a sure way that IT can visibly increase productivity.
Object storage designed for cloud-building integrates advanced capabilities for geographically distributing objects, securing multi-tenancy, and hosting web-friendly APIs. Cloud-size scalability and global content distribution are hallmarks of cloud object storage offerings, as exemplified by EMC’s Atmos and Amazon Web Services S3. Cloud-based sharing and collaboration solutions are ideally hosted on cloud object storage—Dropbox itself is built on S3.
One of the key benefits of cloud-ready object storage is its support for programmer friendly APIs. Web and mobile application developers can simply “post” and “get” arbitrary data into and out of cloud-shared storage over HTTP (through REST-based API URLs that treat the data object as a web resource). They don't have to write to local disk or to complexly cross-mounted file systems that might not be as reliable and certainly not as easily accessible.
Data protection is also evolving in cloud object storage. It's moving away from RAID and beyond full replication through the implementation of space-efficient erasure coding algorithms that protect data for longer periods of time under more failures. Erasure coding enables dialing in a targeted level of data redundancy when writing new objects. Encoded objects are made up of multiple segments—only a subset of which are necessary to read the original data. These segments are cleverly distributed across the cloud to enhance security and resiliency.