It`s Time for a SAN Reality Check
The hardware infrastructure is in place, but the storage area network puzzle is still missing a lot of pieces.
By Chris Wood
The hottest new acronym in the data storage community is SAN (storage area network). Driven by the emergence of gigabit-speed Fibre Channel networking, which finally allows disk and tape storage to be easily attached to multiple hosts, SANs purport to offer significant benefits to IT administrators, including data access flexibility, centralized data management, increased I/O performance, and overall reduced cost. Like many emerging technologies, however, SANs are easier to draw on paper than to actually implement.
Today, the most common method for attaching disk storage to a host system is via the workhorse SCSI bus, a ubiquitous point-to-point or daisy-chained connection between one or more disk drives and a single host computer. Cable length is generally limited to 25 meters or less, and host addressability to attached storage devices is relatively limited. In addition, the SCSI model is based on the concept of a single host transferring data to a single LUN (logical unit, e.g., disk drive) in a serial manner.
This architecture was fine when the accepted computing model was to have a single server running the complete application. But over the past few years the explosive growth of networked computers has led to applications being distributed over hundreds or even thousands of systems. As a result, data associated with these distributed applications was often replicated within the enterprise, moved all or in part over the network as required, or "served" to the client machines from one or more centralized servers.
None of the above three methods of providing data to distributed clients is completely satisfactory. Replication is expensive and hard to manage. Once data is replicated, maintaining consistency between one copy of the data and another is sometimes impossible. Compounding this problem is the fact that most system administrators do not know how many copies of certain files there may be, or even where they are.
Moving data around the network as required is slow and management intensive. For example, a 100BaseT switched Ethernet network, while a great improvement over 10BaseT, really only delivers about 2MBps to 3MBps sustained data rate between any two FTP clients. And this speed assumes that there are no other users attempting to use the network at the same time. Comparing this speed with the speed of direct I/O to a disk array locally attached by a 40MBps UltraSCSI bus, the performance drawback of Ethernet LANs becomes readily apparent. What`s more, data that has to be moved from place to place within the network is not really available all the time; it has to be moved to the application host first, then later moved back--a management- and resource-intensive process.
Serving data from a centralized location eliminates many of the problems associated with moving data, but does not address the inherent performance problems associated with networked data access. Like FTP, NFS (or AFS, DFS, etc.) relies on IP-based transport infrastructures and suffers the same performance impacts mentioned above, as well as the additional hardware costs associated with purchasing large server complexes to front-end the disk farms.
The obvious solution is to attach the centralized disk farm directly to all the clients and allow them to directly access the data that they need when they need it. Get rid of all the locally attached SCSI subsystems with their multiple copies of replicated data. Deliver low-latency, high-speed direct disk I/O access to all data. Eliminate expensive servers, data moves, and copies. In other words, implement a storage area network.
It seems easy: Purchase Fibre Channel peripherals, switches, and hubs, and add Fibre Channel adapters to all your clients. Then buy a "cloud" that sits between all the peripherals and computers, which attaches all the devices together and allows them to seamlessly communicate with each other. One problem: Who sells the "cloud" that appears in almost every vendor`s sales literature (Fig. 1)?
In reality, SANs don`t really work yet. The good news is that many of the infrastructure components that comprise a SAN are now available from a variety of vendors. Second-generation Fibre Channel adapter cards that deliver excellent performance are available for most Unix and Windows NT platforms. FC-AL (Fibre Channel Arbitrated Loop) disk arrays are now shipping in volume from about a dozen vendors, and FC-AL-attached tape subsystems will emerge later this year. Fibre Channel switches and hubs are also available, spanning the range from low-cost, five-port unmanaged hubs to fully redundant fabric switches that offer switching on hundreds of ports. With the advent of improved optical and copper GigaBit Interface Controllers (GBICs) and Gigabit Link--or Loop--Modules (GLMs), interoperability has been vastly improved over last year, and may no longer be a problem by the end of this year.
So what`s missing? What`s keeping IT managers from junking old server farms and installing a lean, mean SAN machine? Not surprisingly, the answer is related to the data itself. More specifically, it`s a question of who "owns" the data; who (or what) manages and controls access to the data, decides how to store and retrieve data off disks or tapes in the SAN, and ensures that every client has a consistent view of the data.
What`s missing from the SAN puzzle are file systems, database managers, access methods, and volume managers. Every computer has one or more of these components. They are responsible for physically writing and reading data off of storage peripherals and presenting data records to user applications in a controlled and consistent manner. They control where and how data is written to disk; how it is indexed, stored, and retrieved; who can access the data and whether they can just read it or actually update it; and, in many cases, how and when the data is backed up to secondary media.
Every vendor`s platform has a file system, for instance, but every vendor`s file system is different. For example, data created on an IBM RS/6000 is managed by the JFS (Journal File System), which has a unique way of storing data. Sun Microsystems supports either the UFS file system or the Veritas file system, neither of which are compatible with each other nor with IBM`s file system. So, data created on an IBM system cannot be read directly by a Sun system, and vice versa. The same is true with systems from all of the other major vendors. Data structures and file systems are a tower of Babel, and no two are alike.
Even sharing common data within a single vendor environment is problematic. File systems were designed based on the traditional single computer model; they are not aware of, nor can they communicate directly with, any other computer`s file system. While communication may not be necessary for read-only data (assuming that the data is being read by the same vendor`s file system), whenever you write or update data it is necessary to inform the other computer that there is new (or changed) data out on the disk, where it is, who can access it, etc.
Collectively, these deficiencies in current file systems represent part of the magical "cloud" component of SANs. While SANs may allow you to physically attach peripheral storage to a network of computers, without the "cloud" functions, SANs offer little improvement over SCSI-based storage.
At a high level, there are four general ways to provide multi-host data sharing within a SAN:
- Global (shared) file systems
- File system emulators
- Third-party transfer
- Controlled requesting
Each has its own unique set of benefits and drawbacks, and all of the approaches are probably going to vie for acceptance in the marketplace.
Global File Systems
A global file system is essentially a common file system that executes across multiple hosts--usually, but not exclusively, the same type of host. These file systems employ a communications structure--generally called a lock manager--between hosts to allow the different hosts to coordinate access to a shared data pool (Fig. 2). Each host in the domain of a global file system can be directly connected to the same set of physical disks and share the same data.
A separate communications path between the sharing hosts is usually employed by the lock manager to set and test semaphores in order to give controlled access to the shared data. For example, if one host needs to update a data record, all the other hosts must be restricted from trying to update the same record simultaneously. And, the hosts must be informed that the record has changed so that if they have a copy of the "old" record in their cache, they can invalidate it and fetch a copy of the "new" record.
In an active shared-data environment, more compute and I/O cycles are often spent managing data access than actually reading and writing data. This constant updating of the access permissions in the global semaphore arrays places a significant burden on the hosts. In heavily cached environments, the constant need to invalidate cached data records can severely compromise the benefits of caching data in the first place.
However, global file systems are farther along the development curve than most other shared-access methods. Cray Research (a division of Silicon Graphics), for example, has offered the Shared File System (SFS) for several years and has partially addressed the semaphore traffic issue by locating the semaphores outboard in the disk drives. Vendors such as MaxStrat provide network storage systems that support Cray`s outboard semaphore architecture.
Some interesting work on reducing control overhead and upping the effective performance of global file systems has been investigated by the University of Minnesota`s Global File System (GFS) project. They placed a lock function out in the disk drives to help reduce host-to-host control traffic and ensure that all write update requests are correctly serialized. In addition, several point solutions that support specific applications and data structures are available from companies such as Mercury Computer, Retrieve Inc., and Transoft. And vendors such as Tricord Systems are developing global, or distributed, file systems, although Tricord`s is not expected to hit the market until next year.
File System Emulators
While global file systems allow data sharing between the same type of hosts that understand the same physical data structures, file system emulators attempt to go this one better. In addition to the lock manager and semaphore structures employed by global file systems, file system emulators allow heterogeneous hosts running different file systems to each view the same collection of data so that it appears to be written in their own proprietary format (Fig. 3).
This is accomplished by storing the data in a common file format known only to the file system emulator. Unique and separate files store all the vendor-specific file information (metadata) that each proprietary host requires in order to access the data. The data view that each client host sees is that of a disk volume with data apparently written in its native file format.
A storage node on a SAN that offers file system emulation is far more than a simple disk array. It is a full-fledged computer, with its own physical file systems, complex emulation code, complete lock management and semaphoring subsystems, as well as multiple metadata storage locations where host-specific file information is stored.
Like global file systems, file system emulators still suffer from high inter-host control traffic, and they introduce potentially severe performance impacts to the host`s ability to effectively cache data. Moreover, the complex emulation functions can add significant cost and data access latency to disk subsystems.
But the benefits of file system emulators are significant: The ability to completely centralize and share data across heterogeneous hosts is one of the Holy Grails of clustered computing. File system emulation allows optimal host application platforms to be utilized when and where required, without the constraints of being locked in to a single vendor`s file system and/or architecture. A few companies, such as Impactdata and Retrieve, are forging ahead in the development of file system emulators.
Of the installed base of shared data SANs, third-party transfer represents the majority of installations. ("Third party" refers to a dedicated control server.) Originally developed as a function of the Unitree hierarchical file management system, and subsequently improved by the National Storage Laboratory at the Lawrence Livermore Labs, third-party transfer data networking is fully supported by IBM`s High Performance Storage System (HPSS) data management software. The third-party transfer method is based on the IEEE Mass Storage Model.
Third-party transfer architectures address the data "ownership" and access control issues by consolidating all data ownership and file system knowledge in a centralized server (Fig. 4). Unlike NFS-style architectures, third-party transfer allows for direct disk I/O access to the central data store by clients. This architecture eliminates the burden of heavy inter-host lock manager and semaphore traffic and presents a well understood, NFS-like application interface. User data flows at local disk speeds (vs. network speeds) over dedicated high-speed disk channels while control traffic flows over a separate control network. The goal is to deliver data at optimal speeds with no interruptions for read/write commands and flow-control handshaking.
Unlike most disk access methods, third-party transfer does not require clients to issue disk I/O (read/write) commands. The client makes an NFS-like request to the third-party disk server, which in turn instructs the disk subsystem that it is to read or write data on behalf of the client. Using a read request as an example, a client machine would issue a read request to the third-party server over an Ethernet connection. Unlike NFS serving, the data is not returned over Ethernet; instead, the third-party server instructs a disk device, usually via a network interface built into the device, to read the data and deliver it directly to the client over a disk I/O channel connected to the requesting client. This type of architecture is called a "push" architecture, because the disk subsystem in effect "pushes" data to clients rather than accepting a traditional read command and delivering data as a result.
Because clients are required to make the disk read/write requests over a network interface, and then receive the resulting data over a different disk interface, special code is required in each client in order to implement this split command and control. Unmodified NFS and SCSI protocol stacks do not support this. In addition, third-party transfer requires a dedicated control server. Coordinating this split command/control architecture is complex and very time sensitive. What`s more, high load conditions can introduce excessive latencies in the SAN.
While the centralized data management of third-party transfer is excellent in reducing inter-client lock manager and semaphore traffic, the "push" architecture does not translate well in a Fibre Channel environment. What`s more, the complex Transfer Notification Response (TNR) protocol required does not exist in the SCSI specification and, with a few exceptions, most disk subsystems do not implement the necessary embedded data-mover code required to "push" data at clients. These challenges have led to a fourth type of SAN implementation: controlled requesting.
Controlled Requesting retains the centralized data management function of third-party transfer, but eliminates the requirement for complex "push" protocols. It is designed to be media independent (e.g., it requires no special fabric functions) and transparent to user applications.
Like third-party transfer, all ownership and physical management of the data resides in a central server. Data requests are made to this server as if it was a generic NFS file server (Fig. 5). Unlike third-party transfer, or traditional NFS file serving, the controlled requesting server responds to clients with a disk I/O command sequence that clients can issue directly to the SAN-attached disk subsystem, just like they would issue a disk command to any locally attached disk.
This hybrid architecture allows for coordinated, shared-disk I/O; SAN-wide managed data caching under control of the controlled requesting server; no inter-client semaphoring; and no timing-sensitive third-party "push" protocols.
In addition, controlled requesting allows for easy implementation of brokered data access tokens to ensure that a "rogue," or unauthorized, client cannot access the shared data. A client protocol stack (Fig. 6) can be implemented to make it appear to the client application that the shared SAN storage pool is an NFS-exported file system image, thus allowing existing applications to run unchanged.
Controlled requesting is currently an experimental architecture that is receiving increasing interest within the storage community. It is indicative of industry efforts to deliver the final "cloud" component to SANs so that they can deliver on the promise of centralized, universally accessible shared storage. Vendors such as IBM and Sun have demonstrated in their laboratories controlled requesting methodologies.
The basic SAN infrastructure is rapidly becoming commercially available over a wide range of vendor platforms. Interoperability among the various hubs, switches, and adapter cards is improving rapidly with the advent of next-generation Fibre Channel chip sets and the formation of the Fibre Channel Loop Community (FCLC). IT managers should seriously consider implementing the necessary SAN infrastructure today so that they will be ready to take advantage of true shared data access as it becomes available over the next several years.
A SAN "cloud" attaches servers, clients, and shared storage devices, and allows them to communicate with each other.
In a global file system configuration, a separate communications path between hosts is used by the lock manager to set and test semaphores in order to give controlled access to the shared data.
File system emulators allow heterogeneous hosts running different file systems to view the same collection of data so that it appears to be written in their own proprietary format.
Third-party transfer architectures address the data "ownership" and access control issues by consolidating all data ownership and file system knowledge in a centralized, "third-party" server.
In a controlled requesting configuration, data requests are made to a control server as if it was a generic NFS file server. The control server responds to clients with a disk I/O command sequence that clients can issue directly to the SAN-attached disk subsystem.
A client protocol stack can be implemented to make it appear to client applications that the shared SAN storage pool is an NFS-exported file system image.
Chris Wood is a vice president at MaxStrat Corp., in Milpitas, CA.