Life was much easier in the ’90s! There was block storage and file storage, and each had their place. Block was for highly transactional data, and file for unstructured and departmental storage.
Network attached storage (NAS) improved performance so that by the end of the ’90s, file storage was suitable for running Oracle databases, and it was growing significantly faster than block storage. Administrators preferred easy-to-manage file storage over complex block storage management with dedicated fibre SAN switches.
Then at the turn of the century, new technologies for storage devices and architectures multiplied. Unified storage emerged, layering block access over file storage or file storage over block devices. Similarly, first generation multi-node scale-out NAS improved scalability but compromised small file performance. When NAS scale-out file storage could not keep pace with the capacity needed for web-scale requirements, a new access protocol called object storage was developed, adding global scale-out but relinquishing easy file access. Initially these products were used for born-in-the-cloud applications such as sync and share, web-mail, backup and archiving.
Today, storage architectures are undergoing yet another makeover, and the once dominant legacy products are being overtaken. Reviewing the quarterly reports of leading legacy storage providers indicates how tough the headwinds have been. For traditional unified and block storage, the transition to all flash systems over the last few years has been breathtaking, spawning numerous storage startups.
But what about scale-out NAS? Beginning in 2001, Isilon made this category a commercial success, contributing to their acquisition by EMC in 2010. NetApp purchased Spinnaker Networks in 2003 and struggled for years to bring out Clustered Data ONTAP. What do these two scale-out NAS architectures have in common? They were both designed early in this century, well before flash and inexpensive multi-core servers used in hyper-scale architectures became ubiquitous.
Instead of redesigning legacy scale-out NAS products, most of the industry continues to focus on object storage, which spawned startups such as Cleversafe and Caringo. Unfortunately, object-based storage has failed to provide the high-performance, enterprise-grade, POSIX-compliant file access that thousands of legacy applications require. It also fails to provide a performance level that can meet the requirements of many big data workloads like media and entertainment, life sciences and commercial HPC.
Object storage companies are trying address these issues by adding file gateway accelerators in front of their object-based backend. But this approach adds another layer of complexity, leaving the door open for a new architecture that is focused on modernizing enterprise capable scale-out NAS.
Will history repeat itself? If a modern file-based architecture can meet enterprise performance and scale as well as object storage, will administrators follow, pursuing ease of use and application capability? The answer is “Yes.”
Ideal Scale-Out NAS Design Principles
What if we had a clean design sheet for new modern modular scalable storage and in particular scale-out NAS storage? The following is a list of attributes that should be considered and adapted for a new modern scale-out storage architecture:
Flash-First Design: No other technology in the last 10 years has done more to challenge the paradigm of traditional storage designs than flash. When flash-based drives first came out, they were an expensive luxury and used sparingly. Now flash is ubiquitous, and any storage device that does not take advantage of its inherent advantages over spinning disk is going to be left behind. Flash enables many new storage attributes including enhanced metadata, removal of battery-backed cache, deduplication performance, and data-tiering that actually makes sense. For scale-out NAS, this means the design can now be centered on flexible software-defined services running on industry standard servers, removing expensive specialty hardware traditionally used to create high performance cache-coherent designs.
Data-Aware Enhanced Metadata: No longer is metadata limited to save precious persistent memory space or IOPS limited by spinning media. New storage devices should expand metadata approaches and give storage an extra boost of brains. Let’s eliminate dumb storage once and for all! For scale-out NAS this enables the device to become data-aware and provides a rich set of real-time analytics at scale. Time- and performance-consuming file system tree walks, metadata scans and file system lookups are eliminated as metadata aggregates are updated and stored in real-time. In addition, metadata will be flexible and extensible over time as new device capabilities are added.
Massive Scalability without Compromising Performance: Object storage vendors would say the reason you cannot use scale-out NAS for big data is because it does not scale. Until now, they were right. Traditional scale-out NAS becomes bogged down at hundreds of million files, which leans toward workloads of very large files and leaves high-performance NAS to NetApp and EMC. With flash-first design and enhanced metadata techniques, modern scale-out NAS should be able to scale to tens of billions of files (a more than 100x improvement) with uncompromised performance for both large and small files. While this scalability might not satisfy the largest of capacity requirements, it should cover the majority of big data use cases with tremendously higher performance and much better legacy application support.
Software-Defined Design: Since flash-first design eliminates special hardware requirements, scale-out NAS software should be portable and run on commodity industry-standard servers. This allows the storage to be deployed on the latest hyper-scale architectures or even run in a virtual machine on a public cloud. This approach ensures that scale-out NAS can be deployed using the same hardware and economics as object storage, making the product extremely cost effective.