BY TERRI McCLURE
In our last column (see “What’s next for NAS?” at infostor.com), we discussed scale-out NAS. As a follow-on to that article, we’re going to explore some new functionality that leverages scale-out architectures to drive performance improvements: parallel NFS (pNFS). While scale-out is usually limited to, or defined as, the independent scaling of storage and processor nodes, pNFS adds bandwidth as a thirddimension of scalability.
Scale-out NAS is composed of a clustered file system running concurrently on multiple physical nodes and managed—leveraging a global namespace—as a single entity. Essentially, this transcends the limitations of individual devices, removing the boundaries of the boxes and enabling efficient management of multiple file servers. Scale-out NAS offers scale beyond that which can be attained in a scale-up system, and still offers users ease of use through single-level management.
Scale-out systems may start with as few as two nodes, but can expand well beyond.Users can start out small with relatively little capital investment and then grow to a mas-sively parallel system. The performance ceiling is raised by adding more processors, and capacity is increased by adding more storage, enabling “just-in-time” scalability.
And management is still simple because the entire cluster is managed as a singleentity—no matter how large it gets. IT managers simply cannot afford to manage hundreds of file systems individually, and scale-out architectures mitigate the issue.
Challenges with NFS
NFS is a widely adopted, standard protocol used for file sharing. By leveraging NFS,directories and files can be stored on a central server or NAS array and accessed re-motely by authorized users. The benefits of deploying dedicated NAS devices includecentralized management of storage re-sources, storage consolidation, easier dataprotection and DR planning, file sharing and collaboration, storage optimization, space savings via quotas, etc. This file-based data—also known as unstructured data—is growing at a muchfaster rate than database or e-mail data, and is projected to make up 70% of total storage capacity by 2012.
One of the key challenges with NFS, especially for large files, is that performance is gated by the bandwidth of the NAS head or processor node that controls, or “owns,” the directory and file being accessed. NAS has been described as a “bump” in the path between the requestor and the data, but that bump can get pretty big. Not only does the NAS head handle locking, permissions, and other file metadata—among other NFS tasks—but all data delivered to the client must also be routed through the NAS head. One person accessing a large file can bring the performance of the NAS head to its knees, leaving otherusers with shares on that head twiddling their thumbs while waiting for their files.
This issue is exacerbated in high performance computing (HPC) environments, which have already experienced a shift to parallel processing where multiple processors accessing shared data can overwhelm NAS heads. In such environments, single files can be in the TB range—or larger. The massive file sizes and shift to parallel processing in HPC, as well as the emergence of parallel processing in commercial computing, combined with the richer unstructured data generated by Web 2.0 applications, are all contributing to a shift towards clustered NAS to meet performance requirements.
Parallel file access further unlocks the performance potential of clustered NAS. A number of vendors have introduced parallel file serving technology to meet this demand, but adoption of these solutions has beenlimited thanks to their proprietary natures and the need to add special clients into the mix. Widespread adoption of parallel file services, if it is indeed going to take off, requires a standard approach.
Parallel NFS takes clustered, global namespace-enabled NAS systems to the next level by introducing the ability to leverage multiple paths from clients directly to the storage, delivering data in parallel. That’s a big performance boost, especially for large files. Files can be striped across NAS heads and, leveraging multiple data paths and processors, delivered in parallel to the requestor. It also introduces the ability to bypass NAS heads for file delivery altogether.
One of the keys to providing paralleldata delivery is the addition of an out-of-band metadata server. The metadata server contains a map detailing how and wheredata is stored. When a file request is made by a client, that request is routed to the metadata server first. The metadata server returns information to the client about where the file “lives” on the associated file servers and then the client can get the information directly. If the file is striped across multiple processor nodes, all the processor nodes can be leveraged to fill the request, providing a boost in both bandwidth and processing power.
pNFS takes the solution one step further by introducing support for direct block data access, essentially bypassing NAS heads entirely in the delivery of file data. When file access is requested by an authorized client in block data mode, the block layout of the file is returned to the requesting client rather than a file layout. Then the client can go directly to the storage devices themselves (rather than NAS heads) to get data. Both iSCSI and Fibre Channel block storage will be supported. In HPC-type environments, where the clients are often servers in the data center, this means they can be connected directly to block storage devices via Fibre Channel or iSCSI and access files (as block data) via multiple fast channel paths—a huge performance boost compared to accessing shared files over NFS and a single NAS head.
On September 23, 2008, one of the last hurdles for NFS 4.1 was cleared when the InternetEngineering Task Force (IETF) put out a last call notice for NFS v4.1. The IETF asked for comments to be submitted by November 27, 2008. The long journey from concept to standard, which started in 2003, is almost finished. NFS v4.1 should be unleashed early this year. New dot releases of the NFS protocol are rarely celebrated or anticipated with such enthusiasm, but this release holds significance since it introduces pNFS functionality.
Some true clustered file systems already offer parallel file delivery—they’ve been developed from the ground up to address this problem. Some require a proprietary client, some don’t. It mostly depends on the type of cluster developed. The introduction of NFS v4.1 and pNFS opens the door for a standardized approach, enabling users to benefit from parallel performance—no matter whose NAS storage they have deployed.
Recently, there have been discussions about adding data de-duplication and metadata striping to the NFS v4.x spec, which would optimize capacity and further enhance performance as richer metadata is introduced. Those making the proposal say that these additions are not expected to create delays in the v4.1 roadmap, but only time will tell. Stay tuned!
TERRI McCLURE is an analyst with the Enterprise Strategy Group (www.enterprisestrategygroup.com).