A beginner's guide to the concepts, components, and benefits of SANs.
BY RALPH H. THORNBURGH AND BARRY J. SCHOENBORN
What is a storage area network? There is no simple answer, but there are simple answers (and they don't conflict with each other). According to various sources, a storage area network (SAN) is
- A fast, reliable, highly scalable mass storage solution designed to provide enormous amounts of storage to an enterprise.
- A network infrastructure that connects computers and devices, transports device commands, and runs value-added software.
- Identified by servers connected to multiple storage devices by means of Fibre Channel hubs, switches, and bridges.
- A topology with three distinct features: storage is not directly connected to network clients; storage is not directly connected to the servers; and storage devices are interconnected.
Figure 1 shows a SAN and its common components. No server is connected to any one storage device, and all storage devices are potentially available to all servers. Connections between devices are made using hubs, switches, and bridges. The "loop" in Figure 1 is intended only to suggest the interconnection of the devices; it isn't an actual connection scheme.
What a SAN is not
- A SAN is not embedded storage, in which disk drives are resident in the server and the amount of storage is limited by the server's capacity to accommodate it. For smaller environments, there's nothing wrong with filling a server's drive bays with high-capacity drives, but there is a physical limit. Also, this arrangement amounts to putting all your eggs in one basket-a server failure would make data unavailable.
In contrast, SANs are scalable. Theoretically, thousands of devices can be added to a SAN. In practice, however, SAN scalability is limited by performance issues and the capabilities of hubs and switches.
- A SAN is not direct-attached storage, which is an extension of embedded storage, with one or more disk arrays connected by SCSI or Fibre Channel directly to a server.
The scalability of direct-attached storage is limited by the number of host bus adapters (HBAs) and addresses available to the server.
- SAN is not network-attached storage (NAS), which is highly useful in many file-serving applications. It's very easy to bring additional storage onto the LAN, and some NAS vendors tout it as a simple three-step process: Attach the network cable, plug in the power, and turn on the storage device. In addition, NAS servers usually feature RAID technology for data protection, and a tape drive for backup.
NAS is scalable, but attaching storage devices to a LAN can degrade overall performance. While NAS works well in small and medium operations, in larger operations performance may be a limiting factor.
Figure 1: In a SAN, storage resides "behind" the servers, and multiple servers can share multiple storage devices.
Client access to data requires client-server and server-NAS interaction. A client request for a record means the client requests the file over the LAN; the server requests the file from NAS over the LAN; the NAS device serves up the file over the LAN; and the server delivers the file to the client over the LAN.
Because both client-access and storage-access interactions use the LAN, there is a quick buildup in traffic and performance penalties on the network. In general, each new client adds a small traffic burden (its own I/Os) to the LAN, and each new NAS device adds a large traffic burden to the LAN.
NAS sits "in front of the server." In his book, Designing Storage Area Networks, Tom Clark describes a SAN as being located "behind the server." A SAN puts storage where I/Os don't impact the clients. The significant contribution of NAS is the idea of networking storage devices. However, it takes a SAN to put the storage behind the server.
What a SAN is
The major distinguishing marks of a SAN are the following: storage is located behind the servers, not on the LAN; and multiple servers can share multiple storage devices. In Figure 1, the storage devices are not on the LAN. They have their own independent connection scheme-the SAN. In the figure, this is shown as a "loop" to suggest that the storage devices are connected together, although the devices are not always connected in a loop. Multiple pools of connected devices are possible as well.
The interconnected group of storage devices might include disk arrays, tape libraries, and optical storage devices. The devices are accessible to all servers through hubs, switches, and bridges.
Figure 2: LUNs can be zoned to prevent interaction with unauthorized servers.
Client-server interaction occurs over the LAN, and server-storage interaction takes place over the SAN. Neither group of devices needs to share bandwidth with the other.
Multiple servers participate in the SAN storage pool. The number of servers is limited only by the physical capabilities of the connecting devices. The same is true of the number of storage devices that can be attached.
There has been a predictable pattern in the evolution of client-server configurations. With a single server, you'd begin by running applications and storing data for applications on embedded disk drives. As storage requirements grew, you'd add attached storage. Eventually, there would be a need for another server, which would likely run different applications and access its own data. That simple model didn't last long for a number of reasons:
- The storage requirements of different applications grow at different rates. Despite the best planning efforts, it's hard to avoid configurations where one server has disk space to burn and another is hurting for space.
- Databases contain a lot of data to be shared. A highly integrated, highly shareable database is one of the Holy Grails of IT. However, because of the size and value of the data to multiple applications, a big database is better placed on an external storage device than on a server.
- Servers can fail, so it's not a good idea to risk data becoming unavailable by placing it on only one server.
A SAN pools the data and offers relatively easy access to the data by multiple servers. This practice lessens the dependence on any one server. Furthermore, it's very easy to add more servers and storage devices, which are immediately accessible by all servers.
In an enterprise with servers running a common operating system, connectivity from any server to any storage device attached to the SAN is relatively easy. This is a homogeneous server environment.
Figure 3: Fibre Channel SAN devices can be connected at distances up to 10km.
But what about mixed servers from different manufacturers and with different operating systems? With the right equipment and design, a SAN can support heterogeneous servers. The goal of a heterogeneous-server SAN is to share data between servers running different operating systems.
Data sharing is possible to the extent that different operating systems can understand and use each others' file systems. However, this promise is not yet fulfilled because Windows NT, Unix, and mainframe file systems are intolerant of each other. In time, however, true data sharing will become possible.
If data can't be shared directly, it can be converted. For example, a number of vendors have software for "mainframe-to-open" and "open-to-mainframe" data conversions. This software typically operates on disk arrays that can emulate both mainframe volumes and open system logical unit numbers (LUNs). However, even if servers don't share or can't convert different data types stored on a SAN, there are still equipment cost and management benefits in sharing different disks on the same physical device.
With appropriate software, servers can "own" LUNs (see Figure 2). And LUNs can be "zoned" to prevent interactions with unauthorized servers.
SAN storage devices
A distinguishing mark of a SAN is the wealth of storage devices that can be attached to the storage network. The number of devices that can be connected is limited by only the hubs, switches, and bridges that interconnect servers and storage devices.
Any SAN-ready storage device can participate in the storage pool because no matter what its purpose, it will be identified by address to the servers that interact with it. Today, a SAN-ready device is a Fibre Channel device. SCSI devices can participate in the SAN by means of Fibre Channel-to-SCSI bridges or routers.
Figure 4: Disaster-recovery sites can be set up by linking local and remote SANs over WANs.
High-end, high-performance, highly managed disk arrays typify the direction in which SAN storage devices are going. High-end arrays are fully redundant, with multiple fans, power supplies, and controllers. They also support multiple paths to the SAN, eliminating single points of failure.
A SAN permits communication directly between storage devices with minimum server interaction. This means that direct disk-to-disk and disk-to-tape backups are possible, which enables "LAN-free" and "serverless" backup.
Today, the dominant connection technology for a SAN is Fibre Channel, which provides high speed and long connection distances. Fibre Channel currently moves data at 1Gbps (approximately 100MBps), and 2Gbps devices began shipping last year.
Fibre Channel data transport is supported over fiber-optic or copper cable. However, copper is seen mainly in intra-cabinet connections between devices, and fiber cable is used far more widely. Fibre Channel can connect devices over relatively long distances-up to 10km-while SCSI's maximum connection distance is 25 meters, without extenders.
SAN interconnection devices
SAN interconnection devices include hubs, switches, bridges, and HBAs. Fibre Channel-Arbitrated Loop (FC-AL) hubs can be used to form a SAN, and there are cascading options to increase distances between devices and the number of ports available for connecting devices. In theory, up to 126 devices can participate in an FC-AL loop, but in reality the number of devices deployed on a hub will be limited. One limitation is the number of ports (10 on a typical hub and 18 when two hubs are cascaded). Another limitation is a performance decline when too many devices contend for bandwidth in a loop (which is a shared polling environment).
Fabric switches are gaining prominence and are replacing hubs in many SAN implementations. In theory, a Fibre Channel switch allows more than 16 million simultaneous connections to the SAN. At this time, even though a switch typically costs about four times as much as a hub, the performance of switches makes them the better interconnection device.
A Fibre Channel bridge, or router, connects SCSI equipment to the SAN. In addition, since few tape libraries are SAN-ready, bridges may be required for tape backup.
Fibre Channel HBAs reside in the servers and provide the connection to SAN hubs and switches. They come with host interfaces, single or dual ports, and replaceable Gigabit Link Modules (GLMs) or Gigabit Interface Controllers (GBICs).
SAN components may be located close to, or relatively far away from, servers. Typically, a SAN has most or all of its storage devices in one room or in separate rooms on one floor of a building. The LAN connects workstations in different departments to the servers.
Tape storage may be in a different room than disk storage or servers, and the data center may be on a different floor of the building from other departments. Using even the relatively limited distances available with short-wave hubs (500 meters), there are many data-center configuration options.
In some enterprises, "local" means "on the same campus;" in others, it means across town. The servers can be located in different buildings, but they are connected to the same SAN (see Figure 3).
Fibre Channel long-wave hubs provide connection distances of up to 10km. Many large corporations have campuses resembling small towns and have worked through the engineering challenges of providing wiring connections between buildings. For crosstown connections, there are a number of leased-line options available.
A cross-country SAN as in Figure 4 is accomplished by using additional WAN hardware. A cross-country SAN is sometimes essential. For example, a company in earthquake-prone Los Angeles may find it prudent to mirror its data in Arizona. In addition, there are cost-saving advantages in centralizing data in regional or national data centers. It takes fewer administrators to manage large amounts of data, maintain equipment, expand capabilities, and protect data.
Why are SANs needed?
Virtually every sector of commerce has an immense requirement for capacity, speed, and reliability in storing data. Government and education, although restrained by budget considerations, have the same requirements. And every enterprise is concerned with containing storage management costs.
In traditional business applications, growth of customer and manufacturing data, business consolidation, and worldwide commerce drive the need for capacity. More data means more disk drives, higher-capacity drives, and more densely populated enclosures.
In newer businesses, applications need enormous capacity and speed from their inception. E-commerce activity can ramp up from conception to reality in just months. Online retail sales, online business-to-business transactions, and online auctions can attract millions of customers. And e-commerce demands high capacity, speed, and reliability in storage.
Video production, video-on-demand, and other imaging applications also have intense storage requirements. Some of these applications are not possible without SANs. Conventional storage architectures can no longer satisfy these requirements. The scale and complexity of applications are driving the demand for SANs.
Reliability is also a key consideration. In all business sectors, the cost of downtime is larger than ever. A SAN is IT's best answer to avoiding downtime catastrophes, whether the catastrophe amounts to an hour less of selling or a week of non-operation. A SAN lends itself to disaster-recovery scenarios better than previous storage strategies.
Reduction of IT operating costs is a dominant factor in driving IT purchasing decisions. Businesses are always interested in a lower price per gigabyte, and hardware manufacturers are accommodating them. Businesses are also interested in lowering the labor costs for storage maintenance, management, and expansion, which can be intensive. Fortunately for IT management, a SAN can be maintained, managed, and expanded with relative ease by fewer people than are required by previous architectures.
Ralph H. Thornburgh is a training engineering consultant at Hewlett-Packard Company in Roseville, CA, and Barry J. Schoenborn is an independent technical writer and owner of Willow Valley Software in Nevada City, CA. Both authored
This article is excerpted, with permission, from Storage Area Networks: Designing and Implementing a Mass Storage System, by Ralph H. Thornburgh and Barry J. Schoenborn. The book is available from a variety of online sites, including amazon, fatbrain, borders, and barnes&noble.