Options include single-instance file systems and distributed file systems–both federated and clustered.
By Noemi Greyzdorf
One of the biggest challenges facing IT organizations today is the exponential growth of unstructured data. This trend is a result of numerous business drivers such as regulatory requirements to retain data, increasing amounts of information being created in digital form, and Web 2.0 applications. Prime examples include industries such as financial services, biotechnology, and media and entertainment.
The storage industry is trying to address the new reality by developing file system software products that address the need for scalability, capacity optimization, a single global namespace, and data availability at a price point that won’t break the IT budget.
Traditionally, two primary environments used file system software: high performance computing (HPC) and file sharing in the enterprise. HPC environments used specialized hardware and software but have recently moved to a Linux-based platform with a distributed file system to support parallel processing. In the enterprise, significant changes have occurred in how file system services are delivered and what challenges must yet be addressed. File sharing and file and print services have been supported by a single-instance file system, such as Windows, or a purpose-built file system packaged as an appliance (more commonly referred to as NAS). However, the diversity of requirements across organizations is driving managers to consider new approaches to address these evolving requirements.
A file system is “a software component that imposes structure on the address space of one or more physical or virtual disks,” according to the Storage Networking Industry Association’s definition. Generally speaking, file systems can be categorized as either single-instance or distributed.
Single-instance file systems are either directly linked with the operating platform (e.g., NTFS on Windows) or a purpose-built file system (e.g., Network Appliance). A distributed file system is either federated, where a management node keeps track of all metadata and a series of nodes deliver the data, or a clustered file system, where the metadata and I/O processing are evenly distributed across all the nodes in a cluster. Deploying the right kind of file system can make a difference in an IT organization’s ability to manage its unstructured data assets.
Single-instance file systems
Single-instance file systems deliver file-sharing services, often bundled with storage and accessed via the NFS or CIFS protocols. The most common single-instance file system is NTFS on Windows. NTFS and the Windows server would be acquired at the same time. A Windows server is often used for file sharing and file and print services. When there is a need for more capacity, a new server running NTFS is purchased and space is allocated to users. NetApp’s platform is another example of a single-instance file system. NetApp delivers its file system packaged with storage, which is accessed via NFS or CIFS and is referred to as a NAS appliance.
In many organizations, the growth of file-based data has created a management issue for storage managers (e.g., there might be too many servers functioning as file servers causing managers to spend too much time on data migration, capacity provisioning, and performance load-balancing).
Distributed: Federated FS
IT organizations are faced with the challenge of managing many file servers used for file and print and file-sharing purposes, and the goal is to find a way to simplify tasks such as file migration and re-allocation of capacity, and to improve capacity utilization. IT managers are seeking easy ways to manage their file-based environments.
One approach is file virtualization, which is designed to provide a single mount point for all files. A common deployment consists of a file virtualization system as a front-end to a number of file servers or NAS appliances. Users are provided a share on this file system; the directory structure and location of files does not change regardless of how data is moved across hardware in the background. File virtualization software manages the physical location of files based on characteristics defined by policies.
File virtualization is a good example of a federated file system. Implementation of file virtualization enables automation for file migration, snapshots, replication across different platforms, and storage tiers. Virtualizing file services enables organizations to explore other vendors’ products as options, as well as to deploy cost-efficient data availability, business continuity, and scalability.
Clustered file systems
There are segments of the market where files represent not just data being shared among employees, but also critical business intelligence that may differentiate the organization from its competitors or provide necessary information for regulatory auditors and investors. File-based data that falls into this category is stored for long periods of time and requires high availability and integrity. In such environments, there is a challenge to deliver scalability, availability, and ease of use yet keep the costs down.
A federated file system can address the challenge by leveraging existing investments (file servers or NAS appliances) through file virtualization. File virtualization software centralizes the management of existing file resources, simplifies migration, improves scalability, and simplifies overall management of the environment. However, as performance requirements increase, a federated file-system architecture may become a bottleneck.
Replacing file server/NAS environment
Another approach to solving the increasing challenge of managing file-based data is to replace the file server/NAS environment. The new solution may involve a different federated file-system architecture that minimizes the performance impact of a metadata node by limiting the exchange between the clients and the master node to metadata queries. Since only a small amount of data has to be exchanged, the impact on performance is minimized. Another approach is a clustered file system that delivers parallel processing capabilities and scales with the addition of every new server node. In this architecture, files may be moved to different tiers of storage without impacting end users.
File-system software is becoming an important component of the data-center infrastructure. This is especially true for organizations that rely on storing and delivering file-based data as their primary revenue-generating activity. These include sites hosting photographs, providing streaming videos, and delivering online storage space for personal data.
As data-center managers strive to tame the explosion in file-based data, they look to file system software developers to deliver value-added features that can help not only manage, scale, and protect the data, but also help minimize its impact on data-center operations. Features such as automatic migration, storage tiers, storage optimization (e.g., compression and de-duplication), and security can make a significant difference in power and cooling consumption, data retention and protection, and personnel resources required to manage the environment.
Obviously, all IT requirements are not equal, and one-solution-fits-all no longer applies. The diverse requirements of different market segments call for the development of solutions that excel at addressing each organization’s specific issues. Enterprises are facing the challenge of identifying and selecting a file system option that fits the criteria for a given application. To make the right decision, they must first understand the options.
Noemi Greyzdorf is a research manager at International Data Corp. (IDC).