A look at the potential benefits of DAFS and how it works.
BY TIM SHERBAK AND DON YENISH
The demand for storage networks that offer scalable, high-performance data management with low total cost of ownership has led to the development of the Direct Access File System (DAFS) protocol. DAFS promises a standards-based, high-performance, low-latency storage network, with features such as data sharing, file-based storage consolidation, and fine-grained data management.
Through the DAFS protocol, application servers are enabled with reliable high-performance access to shared file-based storage pools over Fibre Channel, Gigabit Ethernet, Infini-Band, and other Virtual Interface (VI)-based transports within data center environments. Using standard memory-to-memory interconnection networks as the transport mechanism, DAFS marries the raw performance characteristics of direct-attached storage with the consolidation and manageability benefits of network-attached file-based pools of storage.
The lightweight protocol enables applications to transfer data from application buffers directly to the network transport, bypassing the operating system, while preserving file semantics. This pro-cess significantly reduces
CPU overhead by eliminating data copies, user/kernel context switches, thread context switches, interrupts, and network protocol processing.
In addition, DAFS is enhanced with cluster, fail-over, and data-sharing features to meet the high-availability needs of mission-critical IT environments and extend the robustness of storage network implementations.
Local file sharing
The application of network-attached storage (NAS) may be segmented into two categories: wide file sharing and local file sharing.
A wide file-sharing environment is characterized by many geographically dispersed client machines-often managed by different administrators-running different operating systems and application software that provide desktop services directly to end users. The client machines can access many different file servers and use them to share data occasionally-so that one user can access another user's document, for example. This architecture is commonly deployed as the basis for a corporation's "home directories," centralizing the data of all users into easy-to-manage, highly accessible information stores.
Figure 1: A local file sharing is characterized by a relatively small number of clients collocated in a data center.
With the success of the NAS architectural model and the rise of high-performance networking infrastructures, an emerging local file-sharing architecture has become prominent in data center environments. The local file- sharing environment is characterized by a smaller number of client machines, collocated in a data center (see above figure).
These client machines are typically application servers that provide services (such as database and mail services) to other client machines. The application servers typically run on an identical operating system and use the same software, configured so that the end users of the application see a single service. The application servers access a small set of file servers over a separate high-speed network and can share individual data files. For example, in a scalable Web service, each application server might reference a common home page as new users connect with the service.
Within this environment, computing power, network bandwidth, and storage capacity can be independently scaled simply by adding nodes to the network. These environments also require a high degree of fault tolerance. The services hosted on the fabric of application servers, interconnects, and file servers must be able to withstand failure of an individual node and fail-over to another node in the network.
Figure 2: Linked into local file-sharing applications through an API, DAFS provides developers with direct access to all DAFS features and bypasses the operating system.
Within this environment, file access may be a preferred access method to the back-end storage infrastructure. System integration and deployment is simplified, avoiding low-level integration of devices. Data layout is virtualized so that multi-protocol data sharing can be achieved across servers from different vendors. The logical file structure of the data is maintained on the file servers so that file-level data management tools can be applied in a comprehensive way, independent of each application server environment. Data-access permission, storage utilization, backup, and even disaster recovery can be controlled at the individual file and user level. Data management operations within the back-end storage environment only affect the specified application data-not all the blocks in the volume. Likewise, advanced file system features intrinsic to file servers can protect and restore data more efficiently than block I/O-based storage subsystems.
The foundation of DAFS is its ability to enhance these local file-sharing environments and allow system designers, architects, and administrators to optimize the manageability, performance, and reliability of local file-sharing environments.
Standard memory-to-memory interconnect technologies are emerging and are a pre-requisite for DAFS implementations. The VI architecture was originally developed by Compaq, Intel, and Microsoft to serve as a transport-independent standard for interconnecting computers in clusters and to provide new capabilities not found on traditional interconnection networks.
The first capability is direct memory-to-memory transfer, which allows bulk data to bypass the normal protocol processing and be transferred directly between appropriately aligned buffers on communicating machines. The second capability is direct application access and application processes that can queue data-transfer operations directly to VI-compliant network interfaces without operating system involvement.
Figure 3: DAFS can be implemented as a traditional file system module embedded in the operating system.
Industry excitement over the VI architecture has encouraged the pursuit of many implementations, including Fibre Channel, proprietary interconnection networks, and TCP/IP over Gigabit Ethernet. The new InfiniBand standard for I/O interconnects will also support VI capabilities.
Though the details differ, VI-based implementations share many performance-oriented details. VI host adapters perform all message fragmentation, as-sembly, and alignment in hardware and allow data transfer directly to or from application buffers in virtual memory.
By eliminating processing overhead, the VI architecture drastically reduces CPU utilization and latency. DAFS allows applications to transfer data directly from memory buffers to the network transport, bypassing the operating system, and preserve the underlying file-based data structures throughout the storage network.
Key attributes of the DAFS protocol include the following:
- Sessions are established between the client and server to simplify authentication and to reduce per-transaction overhead.
- Multiple I/O requests can be launched concurrently within a session, allowing multiple client processes high-performance access to data.
- Applications can immediately perform additional processing, pending an interrupt, upon completion of the I/O request.
- A series of dependent operations may be submitted and outstanding without waiting for results. This allows pipelining without stalling.
- Multiple write operations can be batched together by the client for greater efficiency.
- Read ahead and cache control allows file servers to be instructed to pre-fetch data without allocation of additional client-side buffer resources.
- File servers may throttle each node to avoid file server congestion.
The features of local sharing include the following:
- File access semantics for Unix and NT are supported.
- Authentication is performed between clients to servers and servers to clients. Individual users can also authenticate within a client-server session.
- High-speed file and data locking ensure multiple application servers access a consistent view of the data.
- Advanced file-locking features ensure quick recovery and fast data access, even in the event of a system failure on the network.
- Auto-detection and recovery from failures within the network, due to interruptions of network or file services, can be maintained even in the event of node failure.
- Clustered application servers maintain their own notion of nodes that are considered a part of the cluster. DAFS prevents client systems ejected from the cluster from accessing shared data.
DAFS can be implemented in a variety of ways, one of which is as a client-side file access library. Linked into the local file-sharing applications through a standard API, it provides developers with direct access to all DAFS features and completely bypasses operating system overhead (see Figure 2).
An alternative is to implement DAFS as a traditional file system module embedded in the operating system (see Figure 3). Though some operating system overhead is incurred, this type of implementation benefits from the remote DMA, data sharing, centralized data management, and fail-over features of DAFS.
The Direct Access File System protocol is being developed by the DAFS Collaborative, a group of more than 60 companies. The Revision 1.0 protocol specification is due to be published and submitted for standardization this summer.
For more information, browse the organization's Website at www.DAFScollaborative.org.
Tim Sherbak and Don Yenish are members of the Network Appliance marketing team responsible for DAFS-enabled products.