What`s Next for SANs?
Going beyond the hardware infrastructure, SANs require applications, services, and management capabilities.
By Wayne Rickard
A storage area network (SAN) is an emerging new interconnection model, usually based on the ANSI Fibre Channel Standard. In a SAN, switches, hubs, and peripherals can be connected together in a scalable, multi-vendor infrastructure. Unlike their cousins on the LAN side, SAN devices are conversant in both storage and networking protocols, allowing storage devices such as RAID controllers, disk drives, and tape subsystems to be directly connected to the storage network without intermediate servers. Through Fibre Channel host bus adapters, the entire SAN infrastructure can act as a flexible, central data repository behind traditional servers.
SANs complement emerging high-speed LAN architectures, such as Gigabit Ethernet, by providing faster data pipes from primary storage devices into servers. Fibre Channel is ideal for SAN topologies because SCSI-3 maps directly to the Fibre Channel address space without additional translation or routing. And Fibre Channel host adapters are optimized in hardware for the type of interlocked, queued I/O operations common in storage subsystems. This specialization of function assures that both LANs and SANs will survive as balanced components in data access environments.
Today, most SAN solutions are deployed as SCSI upgrades, taking advantage of the greater bandwidth and connectivity of Fibre Channel. The real potential of SANs, however, is in new applications that have the potential to fundamentally change the way we think about data access.
To gain an understanding of how new applications will be enabled by SAN architectures, it is useful to create a relational model of data access elements (Fig. 1). This model is useful for abstracting what happens in a typical data transfer, and for suggesting logical points to create service layers. A service layer provides "hooks" and semantics to interface with third-party applications.
Typical data access takes place in the controlled environment of a server, running a particular file system and operating system. Vendors have added value only at the application level (for example, backup utilities) and at the physical (storage device) level. A SAN infrastructure provides horizontal entry to the access and logical layers of the data access model, creating new areas to add value.
In a shared-nothing architecture, or linear access model (Fig. 2), there is no "infrastructure." Drives are connected directly to a SCSI or IDE interface.
In a simple client/server model (Fig. 3), a networked file system spans clients and servers. This data access model moves raw data vertically over SCSI, and formatted data and control move horizontally over the LAN under control of remote procedural calls (RPCs).
The simple client/ server model is already sharing the data access space with other models. One way to visualize this is to think of a three-tier model, where each layer can act vertically, bypassing lower layers to access required services directly.
A three-tier model (Fig. 4) explodes the simple model along the lines of a presentation layer, where the data is interpreted for the user; an object layer, using data abstractions and environment-specific access mechanisms; and finally, the data layer of persistent storage.
In this three-tiered model, applications can utilize execution objects that "live" across a WAN. A Java virtual machine, for instance, does not rely on the client computer for any operating system-specific feature, file services, volume management, or storage. Similarly, a DBMS may use a proprietary mechanism for organizing and establishing locks on record objects, while using standard access abstractions such as Structured Query Language (SQL). Applications can also establish direct relations with the data layer, mounting volumes across a SAN for their exclusive formatting and use.
Extending this concept of horizontal access to the new SAN infrastructure elements allows a matrix of direct access points to be constructed. The intersection of these new access points with common applications suggests the definition of a "SAN services framework." The new services will crystallize a common set of services for data access and simplify the development of powerful new SAN applications. Three key concepts will dominate the evolution of extended SAN services:
- SAN devices, or storage objects
- SAN/NAS file systems
- An orthogonal storage management layer
Several models exist for object-oriented storage devices, network-attached devices, and file system abstractions. Each model uses different approaches, each with some degree of merit. It is possible that one or more of these theoretical approaches will emerge as the preferred method for standard adoption, or that one of the many industry-sponsored efforts will generate a new method.
For now, it is most useful to look at the applications that are driving the research into SAN services. These applications will utilize the powerful new data and storage models, providing value to organizations implementing SAN topologies.
Goal 1: SAN Device Applications
SAN device applications allow the configuration and control of the persistent storage devices in a SAN environment to acquire some anonymity from the individual operating systems on the servers. This means that RAID configuration, striping, and volume management decisions can be made on pools of global storage. SAN volumes would be allocated logically to servers, rather than physically (Fig. 5). This gives system administrators maximum flexibility in incrementally and independently scaling both storage and processing capacity.
Existing backup applications can take advantage of a SAN backup server to offload data-intensive backup operations. Backup transactions are contained in the SAN, generating no LAN traffic. Scheduled image backups can occur without any interaction, or even physical connection, from the client systems.
Goal 2: SAN File-Aware Applications
Data stored in a file system is composed of two fundamental elements: the file descriptor and the actual data. The file descriptor includes attributes such as the file name, access permissions, creation and modification time stamps, etc. It also includes metadata, or a set of pointers to the logical blocks containing the actual data. An application that has access to the file attributes and metadata is therefore "file-aware" (Fig. 6).
File-aware applications can run self-contained from intelligent agents in the SAN space. These applications can use their knowledge of the file descriptors and state to autonomously perform many data management, access, and tuning functions without impacting applications or LAN traffic.
Examples of file-aware applications include:
- Device access balancing
- File migration, backup, and restore
- File replication/virtualization
- Autonomous defragmentation
- Write auditing, journaling
Goal 3: SAN File Applications
It is possible to define a file system that is fully realized in the SAN space. This file system exposes a consistent, conflict-free I/O interface to both applications and higher-level servers. It would also need to incorporate some mechanism for global naming and authentication.
Some examples of applications enabled by SAN file systems include data sharing (e.g., file access across multiple operating systems) and data mining.
For conventional servers attached to the SAN, the file system could be accessed through the server operating system acting as a front-end processor. In this model (Fig. 7), the server would present a conventional file system, such as NFS, to the applications.
Alternatively, this file system could be accessed directly by applications through the I/O interface. By accessing the file data directly, through an awareness of the SAN file services, an application can directly read and write persistent data over the SAN rather than the LAN. In the case of a SAN built on Fibre Channel hubs and switches, that means direct access to application data at up to 100MBps (Fig. 8).
Some examples of applications enabled by direct SAN file system access include multimedia file serving (e.g., direct file access by player application) and database acceleration (e.g., re-sort file records on index).
Goal 4: SAN Management
SAN topologies will raise the ante on both storage and system management. It will no longer be acceptable to think of storage subsystems as self-contained components, independent of system requirements for backup, performance tuning, and allocation. Management will be necessary to both centralize the control of new services and handle the abstraction of data objects from data assets.
Storage management in a SAN topology includes configuration management, fault management, and policy-based response. Configuration management provides both asset management (a current view of who, what, and where) and mechanisms to allocate, control, and configure the resources. Fault management is the detection of a problem, fault isolation, and correction to normal operation. A policy-based response provides rule-based automatic recovery processes to minimize downtime. Proactive responses to faults require less experienced support staff.
A SAN management model should facilitate system management by allowing system administrators to automate manual tasks, analyze performance, and control system cost and maintenance.
Performance management can be used to monitor key devices or links and indicate that the system configuration is optimized, efficient, and fully utilized.
Trend analysis can be compiled and formatted in a report the system administrator can understand. How quickly is capacity diminishing? What is the ratio of application files to data files? Who are the high-demand users and what are their peak traffic periods?
Downtime analysis can show what events are causing interruptions in service. SANs might legitimately go down for scheduled maintenance or upgrades, or go down unexpectedly for unscheduled maintenance, lost connectivity due to link failure, power cycle event, etc. Since a SAN is likely to be a highly available system with built-in redundancy, administrators are not likely to see the interruptions without active management.
Cost management continuously addresses the costs of providing data services: uptime, mean time between failure, mean time to repair. Storage costs are both direct and indirect. Direct costs are related to capacity, which includes components and infrastructure, and indirect costs are related to downtime, recovery, and loss of data. Total costs can be reduced by better utilization of resources and automation of manual work processes, such as backup or capacity increases. Cost management can provide insight into equipment upgrades, deletion of unused services, and functional tuning.
The most effective way to develop and integrate SAN management capability is to build on established models used in network management. These models use Management Information Bases (MIBs) to define the managed objects and entity relationships (Fig. 9). MIBs are basically a list of information obtainable from each managed object, and are part of a configuration and control system that also includes the manager and the device agents.
The manager supports a user-configurable presentation interface such as OpenView or Java. Through the manager, administrators define which elements are pertinent to alarm reporting, establishes polling intervals, alarm correlation, thresholds, priorities, and priority escalation thresholds. The manager also presents general reporting information, such as availability of devices and services. For trend analysis and performance tuning, the manager can also combine and analyze data.
Device agents interact locally with the devices being managed. Device agents collect management information, store management information in the MIB, service polls from the manager via SNMP, provide proxy services for other managed devices that may not have dedicated agents of their own, and generate SNMP traps to the configured set of SNMP management platforms and consoles.
The elements required to build a SAN infrastructure are now becoming widely available. A number of vendors currently provide Fibre Channel adapters, hubs, switches, JBODs, disk drives, and RAID arrays. While these components are immediately useful as a SCSI replacement/upgrade, their full potential will be realized with the emergence of true SAN applications built on well-defined SAN services. Proprietary solutions are already emerging, and industry groups such as the Storage Networking Industry Association (SNIA) are beginning to address the needs for open standards and a straightforward storage networking taxonomy. With the increase in data infrastructure complexity, robust management capability will become as important in the SAN space as it is today in conventional LAN topologies.
Layers map to the following functions:
Application Layer?Applications that utilize persistent storage at the file level. Data files are typically tightly coupled to the operating-system environment, so the applications need to be somewhat aware of the next level.
Operating System?At this level, files are made available as system resources to the application layer. The operating system also supports other process elements such as time and date stamping, system-specific naming conventions, user- and kernel-level security firewalls, and application launching for executable files.
I/O Subsystem?The I/O subsystem is the heart of the access layer, and provides a high-level control interface between the file system and the operating system. Read, Write, Allocate, and Deallocate commands are typical calls made by the operating system. This layer allows mountable file systems with different characteristics to be supported by a single operating system.
File System?This layer defines the relationship between file data and file control. Components include file attributes (such as creation time, access privilege, etc.), Metadata, and lock management. For client/server file systems, the file system can also include a distributed network access layer (e.g., RPCs), which communicates over the LAN.
Volume System?Volumes are the logical abstraction of storage seen by the file system. Data is organized as blocks, and addressed by logical block addressing (LBA).
Storage Configuration?RAID, mirroring, striping, logical to physical binding, cache management.
Storage Device?Device driver, SCSI, IDE, Fibre Channel Arbitrated Loop (FC-AL).
Storage area networks (SANs) can be configured as data repositories behind traditional servers and LANs.
A SAN infrastructure provides horizontal entry to the access and logical layers of the data access model, creating new areas to add value.
In a shared-nothing architecture, or linear access model, there is no "infrastructure."
In a simple client/server model, a networked file system spans clients and servers.
A three-tier model explodes the simple client/server model into a presentation layer, an object layer, and a data layer.
SAN virtual devices and volume management give system administrators maximum flexibility in incrementally and independently scaling both storage and processing capacity.
An application that has access to the file attributes and metadata is termed "file aware."
For servers attached to a SAN, the file system could be accessed through the operating system acting as a front-end processor. In this model, the server would present a conventional file system, such as NFS, to the applications.
In the case of a SAN built on Fibre Channel hubs and switches, direct access to application data is possible at speeds up to 100MBps.
The most effective way to develop and integrate SAN management capability is to build on established models used in network management. These models use Management Information Bases (MIBs) to define the managed objects and entity relationships.
Wayne Rickard is vice president and general manager of Gadzoox`s Southern California division, headquartered in Placentia, CA. Corporate headquarters for Gadzoox Networks, Inc. are located in San Jose, CA.