By Brad O'Neill
—An article we posted recently on the emerging concept of file area networks, by the Taneja Group's Brad O'Neill, generated a lot of interest. So we asked Brad to expound further on the topic. Here's a longer version of the article that we originally posted on June 27, 2006.
We are at a crossroads in the world of file management. The traditional approaches to managing the growth, cost, and complexity of unstructured file-based information have started to hit a wall across enterprises of all sizes.
According to a June 2006 Taneja Group survey of global IT decision-makers, 62% of the respondents now identify file management as either the top priority or one of the top priorities requiring immediate attention in their data centers. This statistic is further confirmed by the fact that more and more users are evaluating a wide range of new technologies to improve their file management approaches, including
- Wide area file services (WAFS);
- WAN optimization and application acceleration;
- Distributed and clustered file systems;
- Network file management (NFM) and file virtualization;
- File/document management software;
- File classification software; and
- File data placement and movement controls.
What accounts for this concerted focus and innovation around the management and control of files?
Quite simply, what has changed is the relative criticality of file data as it relates to mission-critical business processes. File-based data has become increasingly important. Nearly all workflows ultimately run through a file infrastructure that increasingly spans multiple geographies, business partners, and IT infrastructures, all with real-time performance and access requirements. This explains the rapid file growth and management complexity, as well as vendors' innovations.
IT déjà vu
This scenario is not unprecedented. About a decade ago, the block storage world went through a similar shift as open systems amplified the ease with which storage resources could be utilized and shared across business processes. Storage managers at that time were tearing their hair out trying to control storage growth, cost, and complexity issues. Of course, the breakthrough at that time was the advent of the SAN.
At the highest level, the SAN transformation of the 1990s represented an explicit agreement by users and vendors to pursue a common architectural approach for deploying and sharing storage resources. The extension of the shared resources concept inherent in the "area-network" framework opened up a new dimension for the storage industry.
By analogy, we are now at a similar inflection point with enterprise file management. In short, the industry is again abstracting and extending the "area-network" concept to another layer of the infrastructure. The Taneja Group refers to this as the file area network (FAN).
What's a FAN?
A FAN involves a systematic approach to organizing the various file-related technologies in today's enterprise. The goal of a FAN is to provide IT managers with a scalable, flexible, and intelligent platform for the cost-effective delivery of enterprise file information and a much higher level of file control.
Some of the capabilities that define a FAN include the following:
- Enterprise-wide, pervasive controls of all file information, and management of file attributes based on metadata and content values, regardless of platform;
- Ability to establish user file visibility and access rights based on business values (e.g., departments, projects, geographies) regardless of physical device;
- Non-disruptive, transparent movement of file information across geographical boundaries;
- Creation of file management services that are deployed as true "services" to the entire infrastructure (e.g., not deployed in application-specific silos); and
- Measurable return on investment (ROI) for file management due to consolidation of redundant file resources (e.g., de-duplication of redundant files).
These capabilities may sound familiar because a FAN is to traditional file management what a SAN was to direct-attached storage: a massive step up in capabilities, control, and ROI.
As with a SAN, there are many technologies and approaches that will be possible in the design and deployment of a FAN. Many vendors will participate in the FAN market, and innovation will continue at a fast pace over the next several years.
Establishing an accepted definition of a FAN is critical because it will allow IT teams to develop common shorthand and reference models for how they architect, deploy, manage, and augment their file infrastructures. In the absence of this kind of framework, many enterprises will simply drown in coming years, not only from a deluge of mismanaged file data, but from the inevitable confusion that would result without a common nomenclature.
Elements of a FAN
There are six core elements that will comprise a FAN:
Storage devices—The foundation on which a FAN is built is the storage infrastructure. This can be either a SAN or a NAS environment. The only pre-requisite is that a FAN leverage a networked storage environment to enable data and resource sharing.
File-serving devices/interfaces—Either as a directly integrated part of the storage infrastructure (e.g., NAS) or as a gateway interface (e.g., SAN), all FAN must have devices capable of serving file-level information in the form of standard protocols such as CIFS and/or NFS.
Namespaces—A FAN is based on a file system with the ability to organize, present, and store file content for authorized end clients. This capability is referred to as the file system's "namespace," a central concept in the FAN architecture. There are several kinds of namespaces possible in a FAN (which are discussed below).
File management and control services—The other central concept in a FAN architecture is software intelligence that interoperates with namespaces. From a deployment perspective, these services might be integrated directly with file systems or in networking devices, but they may also be stand-alone services. Examples include file virtualization, classification, de-duplication, and wide-area file services (WAFS), which are covered in more detail below.
End clients—All FANs have end client machines that access the namespaces created by file systems. The clients could be on any type of platform or computing device.
Connectivity—There are many possible ways for a FAN to connect its end clients to the namespaces. They are commonly connected across a standard LAN, but they may simultaneously or alternatively leverage wide-area technologies, as well.
Namespaces: Fabric of the FAN
The June 2006 Taneja Group survey also found that 57% of global IT users either already have deployed or are currently exploring deployment of advanced namespace technologies to improve file management. In other words, those users are assembling their first FAN.
Understanding what namespace technology means to a FAN is critical. In fact, by analogy, the namespace is to the FAN what the switching fabric is to a SAN. However, the key difference with a FAN is that we are talking about relationships of information presentation and not about physical device relationships.
The presentation, access, and general organization (i.e., the directory structure) of a file system's data is referred to as its namespace. In a FAN, there are three types of namespaces possible. Most enterprises will eventually have a combination of these to address various issues.
Non-shared namespace—This is the default when enterprises establish basic file services or traditional NAS. It is a user-level presentation of information corresponding to a file system image that is married to a given physical machine. In other words, there is no sharing of information across multiple file system images. The vast majority of file systems deployed today deliver non-shared namespaces. However, they cause IT headaches as they grow and outstrip their file system capabilities.
Shared namespace—This is when a subset of an enterprise's physical file presentation environment has been federated so that information can be shared across multiple homogeneous machines. This enables the IT team to use those homogeneous machines for a common presentation of user-level information to designated end clients. Typically, shared namespaces are platform specific and not intended for deployment across all end clients in the enterprise. Because they tightly couple multiple file systems, shared namespaces can resolve significant file visibility, collaboration, and performance issues for a targeted subset of the enterprise. Common examples include clustered NAS environments and clustered or distributed file system deployments.
Global unified namespace (GUN)—The Holy Grail for namespaces in a FAN is what Taneja Group refers to as a GUN: a truly heterogeneous, enterprise-wide abstraction of all file-level information, open to dynamic customization based on administrator-defined parameters. This is the level at which significant management control and leverage is finally possible. A wide range of software intelligence can then be applied to the GUN with assurance that it will be applicable across the entire enterprise (e.g., access controls, file virtualization, classification schema, de-duplication, etc.). From an architectural perspective, a GUN could be established in any number of ways, including distributed host-based software or network-resident approaches.
Control and management: Software services
The other major definition to explore with a FAN is file management and control services. These are the software tools that interact with the namespaces, physical file systems, storage, and connectivity to add significant value to the FAN. These software capabilities are the brains of the FAN and encompass a range of existing technologies as well as new innovations recently hitting the market.
Continuing with the SAN analogy, these software services are to a FAN what storage management software is to a SAN. The software services include the following product categories:
Migration services—Moving files non-disruptively underneath shared namespaces or global unified namespaces is one of the most powerful IT benefits of a FAN. In fact, this is part of the core "plumbing" of the FAN. This capability can be achieved at many levels, including distributed host-based software, network-based, or appliance-based approaches.
Replication services—All files in a FAN must be able to be non-disruptively replicated between resources and geographies. This may take place through any number of technologies deployed at various layers of the infrastructure (e.g., host, NAS appliance, or network). Support for non-disruptive file-level replication is critical for the FAN architecture.
Placement services—The ability to place file-level data on a given physical device based on its attributes will be a key component of any FAN. Optimal data placement ensures the servers and storage supporting a FAN are maintaining appropriate performance and utilization levels. This can be achieved through a range of in-band network resident approaches like Network File Management devices (NFM), some Information Classification and Management (ICM) technologies, or through distributed software approaches.
Information classification services—The Information Classification and Management (ICM) category has gained significant momentum in the past 24 months as enterprises are learning to execute granular control on files. This software enables content-level indexing of all information that then supports policy-based controls, access, and retention. ICM is an essential component of any FAN.
Access and extension services—Being able to extend access to the FAN across geographies is critical for most enterprises. As a result, a FAN must be able to support wide-area connectivity into its namespaces. The goal is not merely to connect geographies, but to connect them with near-LAN access speeds and service levels. Various technologies can accomplish this, including various WAN optimization technologies and WAFS.
FAN technologies, vendors
Given the wide range of services that go into a FAN, it is not surprising that a large number of vendors from many new and emerging product categories are participating in this framework. The following vendor roundup encapsulates some of the up-and-coming and established vendors worthy of review for building out core capabilities in the FAN:
Network file management (NFM) vendors—These companies play a key role in the placement, and increasingly the classification, of file data. Sometimes referred to as "file virtualization" products, they are actually much more. These products are "FAN-friendly" in that they preserve the existing namespaces of the devices they are virtualizing, providing a platform against which many file-level operations can take place, including migration, replication, some classification, and optimal data placement.
Vendors that have defined this category from a network-based approach include Acopia Networks, Attune Networks, and NeoPath Networks. With a distributed software approach, Brocade's NuView technology also achieves this goal. EMC's Rainfinity also provides data-placement capabilities for heterogeneous file serving devices. Another vendor to note here is Njini, a classification software vendor that provides in-band real-time data placement based on file classification criteria.
Information classification and management (ICM) vendors—Companies in this category are critical to providing the advanced FAN file-level controls for classification and retrieval of information. They can provide a combination of metadata and content-level classification of all unstructured file content. In some use cases, tradeoffs between NFM and ICM functionality are possible (e.g., migration capabilities or data placement) depending on the usage case and workflows of the FAN. It is possible to create true enterprise-wide FAN classification leveraging ICM tools.
While they represent many flavors of file control, representative vendors in the ICM product category include Abrevity, Arkivio, Blackball, Enigma Data, Index Engines, FAST, Kazeon, Scentric, and StoredIQ. Most of the major IT vendors have already begun the process of integrating these capabilities into their solutions through OEM/reseller partnerships, strategic investments, and potential acquisitions. By the second half of 2007, we expect all major platforms to provide some type of integrated ICM functionality.
Wide area file services (WAFS) and wide-area acceleration vendors—Connecting the namespaces of the FAN in a seamless fashion across multiple geographies is an essential capability. In the most comprehensive cases, this will encompass several discrete technologies: file services, network optimization, and application acceleration. In short, file, application, and network resources need to be approached as integrated elements in a FAN strategy.
The following vendors are providers of FAN extension technologies: Availl, Expand (acquired Disk Sites), Orbital Data, Packeteer (acquired Tacit Networks), Riverbed, and Silver Peak. Among the established tier-one vendors, Brocade, Cisco, Hewlett-Packard, and Juniper have made major investments or acquisitions in this product category. Users should expect the partnering activity to continue as major vendors fill out this critical element of their wide-area strategies.
Among the tier one vendors, a handful are providing the overall FAN framework, either directly or through partners. However, as this market gains traction, both the roster of competitors and their range of capabilities will expand significantly. (Companies are listed in alphabetical order.)
Brocade—Brocade has made a clear strategic shift to address file-level management in recent years, expanding beyond its traditional strengths in block-level SAN fabric switching. With its acquisition of NuView and that company's various namespace creation and file services capabilities, as well as its partnership with Packeteer/Tacit, Brocade is positioned to drive FAN solutions that can be aligned with the company's core switching business.
EMC—With its acquisitions of Documentum and Rainfinity, EMC now has a powerful range of file management software services, as well as namespace creation and management capabilities. We expect EMC to drive into additional file services in coming quarters as the company's FAN strategy unfolds.
Hewlett-Packard—As Microsoft's largest OEM partner, HP has enjoyed a strong market presence in NAS for several years. The company has internally developed file-level ILM capabilities (classification, migration, de-duplication) to support a range of FAN software services and has OEM relationships with players such as PolyServe (Linux and Windows-based clustered file systems) and Riverbed (WAN optimization) that fit into a FAN portfolio. We expect HP to play a large role in FAN adoption because of these factors.
Microsoft—As the owner of the most pervasive server platform, Microsoft has the opportunity to both position and shape the growth of the FAN market. In addition to namespace technologies such as DFS, the latest Windows Server R2 release provides a suite of software services that can have an impact on the quality of control customers will be able to execute on Microsoft-centric FANs.
Network Appliance—The pioneer of the enterprise NAS market has already made significant investments to prepare for the emergence of FAN architectures, as well. NetApp's new GX cluster-based platform will ultimately enable advanced namespaces, and the company has a range of both internally developed and partnered technologies for FAN software services, including migration, replication, classification, and de-duplication.
For the rest of this year and into 2007, we expect to see these vendors play a major role in extending the capabilities of enterprise file management through its next evolutionary stage: the file area network.
Brad O'Neill is a senior analyst with the Taneja Group research and consulting firm (www.tanejagroup.com).