Traditional NAS approaches often fall short in environments with rapid growth in file-based data.
By Noemi Greyzdorf
-- First, there was direct-attached storage (DAS), where every server came with its own disk drives, and the application and storage managers were glad. Then came networked storage in the form of SANs or NAS, which created economies of scale and scope, and the application owners and storage managers were glad. Today, IT requirements are forcing application owners and storage managers to once again revisit their storage strategies and determine ways to manage increasingly large stores of data.
In 2008, file-based data consumed more than 50% of all storage, and by 2012 nearly 80% of all storage will be consumed by file-based data. As market demands continue to evolve, storage managers are seeking solutions that are dynamic, scalable, flexible, and persistent.
The file-based storage market can be segmented into five primary areas:
File services consolidation
Storage for virtual server environments
Some of the solutions on the market can address more than one of these segments, so the purpose of this article is not to delve into specific vendors and products, but to identify requirements and potential architectures to address emerging IT requirements.
The file-based market is represented by file system software that has been designed to support the delivery of storage capacity to users and applications. File system software offerings can be placed into two primary categories: one file system to one server, or, one file system to many servers. The latter is often referred to as a distributed file system, where a single file system namespace can span multiple server nodes, allowing for access to data through any of the nodes in the cluster.
Another way to segment file-based storage is to look at how the solutions are packaged. The traditional approach has been a fully integrated NAS appliance, where the file system and server run on what is often referred to as a black box, which may or may not include storage on the back. When it comes with storage, it is referred to as a NAS appliance; when it is just the file system and the server it is referred to as a NAS gateway.
File systems can also be acquired as a software-only solution and integrated with virtually any hardware, and as long as access to the file system is through standard network protocols such as NFS or CIFS it will in its final configuration be referred to as NAS. Of course, file system software can also be used natively, either accessed through proprietary client software or under an application allowing for direct access to storage via iSCSI or Fibre Channel. Regardless of how you package or use a file-based storage solution, the key to the final configuration's design is the file system.
When discussing the various segments of the market utilizing file-based storage, there is often a distinction made between access protocols, architectures, and packaging.
The globalization of markets and the advent of Internet technologies has had a significant effect on how companies conduct business. Enterprises large and small are continuously in contact with their customers, suppliers, partners, and employees around the world, and any disruption to this contact can result in loss of revenue, loss of productivity, and loss of good will. Applications that enable organizations to share information and interact with the larger community are critical to the success of the business.
There have been a number of products to help assure application and data availability, including high availability software, parallel applications, replication, replication with failover, and bare metal restore. Some of these approaches have shorter downtime, some have less complexity, some support local environments only, others support a limited number of applications.
Recently, the increasing focus on clustered file-based storage architectures has created another option: application availability with less complexity and shorter downtime via a combination of application high availability on top of a clustered file system. In such deployments, an application can run on a server node that is part of a cluster of server nodes all interconnected with a clustered file system. The application accesses storage directly through either iSCSI, Fibre Channel, or InfiniBand.
The centralized storage repository is mounted to all the server nodes in the cluster, and though all the nodes can see the storage, only the server actively running the application has read and write permissions. The only caveat to this is if the application runs in parallel mode, then all the servers running the application would have write and read access to storage.
In case of a failure, the application has to be restarted on one of the nodes in the cluster without requiring all other interim steps such as dismounting storage, closing down services, checking data consistency and storage configuration, and mounting storage to a new node, which can take twenty minutes or more. Using a clustered file system under application high availability allows for the recovery of the application in less then two minutes, with little complexity or room for error.
File services consolidation
A number of larger organizations are looking for ways to consolidate file services. The goals are to reduce operational complexity, consolidate remote offices into a centrally managed system, assure data persistence, and facilitate information governance.
It is not uncommon for a large enterprise to manage multiple file servers, which creates management challenges. These challenges include load balancing users, migrating data and users, assuring adequate performance, downtime during upgrades, and administrative time required to manage the environment. Additionally, data is often dispersed across many systems, and sometimes across many sites, which makes it more difficult to apply capacity optimization technologies and information governance policies.
The advent of distributed file-based storage systems creates another option for file services. Deploying a clustered file-based storage solution allows administrators to consolidate storage resources and parse them out based on need without a lot of manual intervention to load balance capacity and performance.
Clustered architectures also promise a higher level of data availability and system persistence, seamless upgrades, and zero downtime due to maintenance or drive or node failure. These clustered file-based storage solutions can be deployed packaged as NAS with standard network protocols supported for access, or as a file system that requires client agent software to access the file system.
The distributed nature of the file system itself allows for easy scalability and data sharing. For smaller environments, there may not be a need to scale-out file-based storage architectures; a traditional file server or NAS appliance delivers the necessary capacity and performance. The use of clustering technologies in smaller environments may primarily appear where the machines being clustered for file services consolidation are virtual machines part of a broader data center virtualization program, which gives some investment protection and flexibility to smaller environments without the cost.
Server virtualization storage
Virtualization is clearly a way to achieve efficiency and consolidation of resources, but storage has been one of the more challenging aspects of server consolidation and virtualization. Although it is often an afterthought, storage is a critical component of any server virtualization initiative. Since each virtual machine is really a file and the data associated with the virtual machine is also a file, such as vmdk or vhd, it would only make sense to use a file-based storage platform with server virtualization platforms.
Because file-based storage solutions are flexible in how they are accessed and packaged, there are a number of options. There are native file system solutions available to organize data and provide sharing of data across the environment. There are clustered file systems that can be deployed on top of the hypervisor platform, allowing all the virtual machines to share storage resources but have direct block-level access to the storage resource. Finally, the hypervisor platform and the virtual machines can access storage via standard network protocols such as NFS.
Using a file-based storage system with server virtualization provides a high level of management simplicity and flexibility, and in some instances increased scalability and performance. However, not all environments will deploy file-based storage for virtual server environments because there are many applications that require block-level access to storage.
All the file data that is created must be stored somewhere. The large file-based storage repositories of unstructured data have been growing rapidly over the past few years and this trend will continue. The primary drivers behind the growth are retention requirements. Many industries are required by regulatory law to retain information for extended periods of time. For example, healthcare is required to keep medical record information typically for the life of the patient.
Other drivers include product support, collaboration with other organizations, or data that serves as a product itself, such as media and entertainment. A lot of this content is relatively static, with infrequent access patterns. Storage managers and service providers are seeking more efficient ways to store all this file-based unstructured data and yet allow timely access to the data if there is a need. The efficiency is in capacity consumed, facilities consumed, and managed hours consumed. The appropriate file-based storage solution will depend on the amount of data being stored and the rate of growth of such data.
High performance computing (HPC) environments are the traditional space of distributed, clustered file-based storage solutions. In many instances, the applications in HPC environments require either highly sequential throughput or parallel access to the same piece of data. Traditional NAS approaches may not have the performance necessary to run these applications. Unlike the enterprise, though, many HPC environments have a different set of requirements related to data protection, data availability, and overall system resiliency.
The demand for file-based storage systems that address changing IT requirements is on the rise. The market can be divided into main categories, but often within each category the requirements vary based on workloads. A variety of solutions have emerged that target a particular workload type or market segment, specializing in what is required from these users.
The old days of one-NAS–fits-all are gone. Organizations will adopt the right solution for each problem, but instead of deploying islands of solutions throughout the enterprise, IT will seek to create storage pools with predefined capacity, protection, and performance parameters, and offer storage services based on this platform to various users within the enterprise. The gains achieved through this approach include economies of scale, lower operating complexity and cost, and greater flexibility and accountability for resources being consumed.
Noemi Greyzdorf is a research manager with IDC. She can be contacted at firstname.lastname@example.org.