An excerpt from Designing Storage Area Networks examines some of the applications that can benefit most from SANs.
BY TOM CLARK
Although storage area networks (SANs) share common components in the form of servers, storage, and interconnect devices, the configuration of a storage network is determined by the application problems it resolves. The requirements for a full-motion video application differ from those for high-availability online transaction processing (OLTP).
LAN-free tape-backup applications may use unique hardware and software products that would not appear in a SAN designed around server-clustering requirements. Because SANs offer the flexibility of networking, however, it is possible to satisfy the needs of multiple applications within a single networked configuration, just as a LAN backbone may service disparate applications for an enterprise.
Figure 1: A peer video-editing SAN via a switched fabric
The following application studies examine SAN installations that were designed to meet specific requirements. In some instances, the deployment of a new networked infrastructure has provided additional opportunities for resolving unrelated issues. A SAN designed for a high-bandwidth application, for example, also facilitates a more efficient tape-backup solution. Although SANs are not a panacea for every storage application need, the building blocks that SANs provide can be used to construct a wide range of viable solutions unattainable by other means.
Using one of the first applications of Fibre Channel technology, full-motion video editing and broadcast companies have leveraged the bandwidth, distance, and shared resources that SANs enable. Digitized video has several unique requirements, including the sustained transmission of multiple 30MBps streams and intolerance for disruption or delays, which exceed the capabilities of legacy data transports. Most SAN-based video applications use the SCSI-3 protocol to move data from disk to workstations, although custom configurations have been engineered using Internet Protocol (IP) for multicast and broadcast distribution.
Figure 2: Video SAN for sports training
Some of the first video SANs used arbitrated loop for the underlying topology. An efficiently designed loop will support three video streams but is susceptible to the potential disruption of loop initialization primitives (LIPs) or loss of all streams if a node on the shared transport misbehaves. The dedicated bandwidth that a fabric provides is more suitable for video applications but requires fabric services that were not originally available for host bus adapters (HBAs) and disks. A number of installations in use today are therefore based on private loop switching. Loop switching accommodates the various levels of private loop HBAs in workstations (e.g., NT, Mac, and SGI) while offering the connectivity and per-port throughput of a fabric switch. As HBA and disk vendors have developed fabric service support on their products, fabrics have gradually displaced loop switching for these operations.
Video applications have common transport requirements but vary considerably in content. A video-editing application may center on a workgroup configuration, as shown in Figure 1, allowing peer workstations to access and modify video streams from one or more disk arrays. In addition to the physical SAN topology, any application that allows data sharing must have software support for file access and locking by multiple users. A video broadcast application that serves up content from a central data source to multiple feeds must have the means to support IP multicast across the Fibre Channel network. Video used for training applications may support both editing workstations and user stations, with random access to shared video clips or instructional modules digitized on disk.
Figure 3: Prepress SAN with switched and loop segments
Video over Fibre Channel has appeared in some surprising locations, such as on the desktops of football coaches for major university and professional teams. Although it has been common practice to use video tapes of major games to analyze player performance and the strategy of the opposing teams, the mechanical limitations of video tape make it difficult to access individual plays quickly for analysis. Access to archived games is also difficult, since tapes must be cataloged, stored, and manually mounted for playback. These limitations are overcome by digitizing video to disk via editing workstations and then marking the play sequences with software pointers. A coach can then pull up any desired portion of a game for playback, using recorder-type controls for slow motion, rewind, and stop motion. The storage requirements for such an application are quite high-potentially terabytes of data-as is the bandwidth required to drive multiple coach workstations and training rooms for players. Distance is also a factor, since workstations may be spread across an entire floor or multiple floors of a facility.
Figure 2 depicts a small SAN configuration for shared access to digitized video stored on disk. Since the retrieved plays are relatively short and are called up at random, the duration of video streaming to multiple coach stations is sporadic, bursty traffic. This prevents the 100MBps cascade link between the storage/editing switch and the coach/training switch from becoming a bottleneck, as it might if the streams were persistent. The video-editing workstations are used to load the digitized video to disk and to place software markers for plays and so are best positioned on the same fabric as the disk arrays. This simple configuration is expanded to support additional coach stations by cascading more fabric switches from the root, storage/editing switch. Some installations of this type may have 20 or more coach stations interconnected by the SAN.
Figure 4: Tape backup across a departmental network
Although the Fibre Channel fabric switches, HBAs, and disk arrays enable this application to be implemented, the SAN-specific components are the least expensive items in the configuration. The software required to convert and catalog the digitized data and to create a user interface that facilitates play analysis represents the major portion of the investment. According to the coaches who use these systems, the return on investment is amply demonstrated by the games they have won.
"Prepress" refers to the creation and modification of graphical images for advertisements, catalogs, and posters. Graphics can be as simple as low-resolution black-and-white newspaper ads or as sophisticated as large, four-color, high-resolution images applied to billboards or city buses.
Unlike full-motion video applications, computerized prepress data traffic is always bursty in nature. A single graphics image is read from disk, rendered for several hours by a graphics artist at a workstation and written back to disk. The file may pass through multiple revisions, and therefore multiple workstations, as other detail, titles, and legends are added. When graphical editing is complete, the file is then read by a preprint processor for conversion from digital format to hardcopy or print negative.
Since a graphical image must go through a series of editing steps-each by a different artist-as it passes through the production process, file ownership is critical for maintaining data integrity. If the same image is inadvertently opened and modified at the same time, hours of work can be lost. Software companies that specialize in prepress operations resolve this potential problem by providing file-sharing middleware. This software resides on each workstation and, by intercepting calls from the operating system to the file system, allows file ownership to be transferred serially from one user to another as the file is read from and written back to disk.
In addition to file ownership, the amount of time it takes to read a large graphics image from disk for editing and write back the modified version is an important issue for prepress. Read/write time is, for the user, downtime, and the accumulated downtime between edits can impact the entire production process. For larger prepress operations in particular, the bandwidth supplied by a LAN is insufficient for concurrent file transfers of image files that are often in the hundreds of megabytes.
Like video applications, prepress has a voracious appetite for storage. A catalog production, for example, may require hundreds of gigabytes of storage for high- resolution photographic images and formatting information. A major brands consumer catalog may have three or more editions per year, with revisions of some product images and introduction of new ones. All of this data must be maintained and accessible for updates. File compression helps reduce the overall storage requirement but is less effective for high-resolution images.
Figure 5: Transitional LAN-free backup implementation
SANs were introduced into prepress operations primarily by vendors of file-access software. As a total solution, the combination of file-access middleware, higher bandwidth and shared storage via Fibre Channel, and increased storage capacity provided by Fibre Channel disk arrays addresses most of the data infrastructure issues prepress operators face.
In Figure 3, graphics artists are segmented into smaller, shared 100MBps loops, whereas RAID disk enclosures reside on dedicated 100MBps links via a fabric switch. The distribution of users is scalable, since additional users may be accommodated with other loop segments and the population of each loop adjusted according to workload and bandwidth requirements. Shared storage in this configuration is also scalable, both by the addition of drives into the RAID enclosures and by attachment of new arrays over time. Specialized preprint processors are brought into the SAN via Fibre Channel-to-SCSI bridges. And finally, file-access software on each graphics workstation ensures a file can be modified by only one user at a time and that the identity of the current owner is known. This SAN solution also provides greater efficiency by transporting graphics files with SCSI protocol, as opposed to IP or IPX overhead required by a LAN transport.
For IT operations, tape backup poses a number of problems, none of which are easily addressed by traditional parallel SCSI or LAN-based methods. As long as disk arrays are bound to individual servers, tape-backup options are limited to server-attached tape subsystems or transport of backup data across the messaging network. Provisioning each server with its own tape-backup system is an expensive solution and requires additional overhead for administration of scheduling and tape rotation on multiple tape units. Performing backups across the production LAN allows for the centralization of administration to one or more large tape subsystems but burdens the messaging network with much higher traffic volumes during backup operations. In addition, scheduling backups for multiple servers to a central tape resource creates an inherent contradiction between the time required to back up all servers and the time available for nondisruptive access to the network. Scheduling backups during nonpeak hours-8:00 pm to 6:00 am-may not provide sufficient time to back up all data and is not an option for enterprises that operate across multiple or international time zones.
Figure 6: LAN-free and server-free tape-backup installation
In Figure 4, four departmental servers share a common tape-backup resource across the production LAN. Even with switched 100Mbps Ethernet and no competing user traffic, the maximum sustained throughput from server to tape is approximately 25GB per hour. If each server supports a very moderate 100GB of data, a full backup of the department's data would require 16 hours. Backups, how ever, are normally scheduled for incremental backup of changed files on a daily basis, with full disk backups occurring only once a month or quarter. To accommodate both full and incremental backups, the full-backup routines would have to be rotated among different servers on different days and then only during periods when full LAN bandwidth was available.
As the volume of data exceeds the allowable backup window and stresses the bandwidth capacity of the messaging network, either the bandwidth of the messaging network must be increased or the backup data must be removed from the messaging network altogether. Installing a high-speed LAN transport such as switched Gigabit Ethernet can alleviate the burden on the production network but leaves the server/storage relationship unchanged. Just as the user saturation of 10Mbps Ethernet engendered 100Mbps Ethernet and the saturation of 100Mbps Ethernet begot Gigabit Ethernet, opening larger pipes on the LAN may not provide a long-term solution. If you build bandwidth, user data will come. Resolving the potential conflict between user traffic and storage-backup requirements is accomplished, therefore, only by isolating each onto separate networks. A storage network removes backup data from the production network, provides an equivalent high-speed transport to Gigabit Ethernet, and, by separating servers from storage, allows other backup and storage technologies to emerge.
Figure 7: A fully redundant server cluster using arbitrated loop for shared access
As a transitional configuration, the SAN in Figure 5 is installed solely to offload the production network. Existing parallel SCSI-attached drives are left intact, and the new components include only Fibre Channel HBAs, a loop hub or fabric switch, and a Fibre Channel-to-SCSI bridge. Since the tape subsystem appears to each server as another SCSI device on a separate SCSI bus, it is accessible to the tape-backup client residing on each server. The backup scheduler instructs each server when and what kind of backup to perform on a sequential basis. Since the backup data path is now across a dedicated SAN, the constraints of the messaging network are removed from the backup process, and the burden of backup traffic is removed from the LAN.
The 100MBps bandwidth provided by Fibre Channel and the flexibility of moving backup data on its own transport, however, do not resolve every issue associated with this backup implementation. Although the SAN transport may allow backup data to move at high speed, other limiting factors include server performance, data rate of the parallel SCSI drives, the type of data being backed up, performance of the FC-SCSI bridge, and the throughput of the tape subsystem itself. The slowest component in a backup configuration is usually determined by the tape drive's sustained streaming rate. A tape unit may be able to stream only 10MBps to 15MBps and so cannot fully use the bandwidth Fibre Channel makes available. The overall time required for full backups is thus improved only moderately by Fibre Channel, although the scheduling itself is no longer dependent on or interferes with LAN traffic patterns.
Since each of the four servers is now provisioned with a Fibre Channel HBA, other options are available for reducing backup times. Some Fibre Channel-to-SCSI bridges offer two Fibre Channel interfaces. If the SAN interconnect is a fabric switch, two servers can perform concurrent backups to two bridge-attached tape subsystems, thus cutting the overall backup time in half.
Figure 8: A small ISP implementation using NAS
Optimizing the backup routine further requires several additional SAN components. Moving disk storage from parallel SCSI to Fibre Channel-attached arrays offers, among other things, the ability to remove the server from the backup data path. This is the most significant improvement from the standpoint of performance and nondisruptive backup operations. If server resources are freed from backup tasks, the servers are always available for user access. And if the backup process itself does not interfere with user access to data, the backup window is no longer defined by users or the relatively slow performance of the tape subsystem.
Backups may be performed at any time, provided that the backup software handles file permissions and updates and that a Fibre Channel-attached backup agent exists to buffer data from disk to tape. The backup agent may exist as a Network Data Management Protocol (NDMP) or as a Third Party Copy protocol utility resident on the interconnect, a dedicated Fibre Channel-attached backup server, or in a Fibre Channel-to-SCSI bridge or native Fibre Channel tape subsystem.
Figure 6 demonstrates an extension of the departmental tape-backup solution that incorporates Fibre Channel-attached disk arrays and a Third Party Copy or NDMP utility resident on a Fibre Channel-to-SCSI bridge. In this configuration, backup data is read directly from disk by the copy agent and written to tape, bypassing the server. Whereas the SAN provides the vehicle to move the backup data, the backup software must control when and where to move it. Concurrent backup and user access to the same data are possible if the backup protocol maintains metadata-file information about the actual data-to track changes that users may make to data, such as records, as it is being written to tape. As higher- performance native Fibre Channel tape subsystems become available, the ability to back up and restore over the SAN will better accommodate the growing volume of data that enterprises generate.
As enterprise applications have shifted from mainframe and midrange systems to application and file servers, the reliable access to data that the legacy systems provided-and that required decades of engineering to accomplish-has been compromised. To make their products acceptable for enterprise use, server manufacturers have responded with more-sophisticated designs that offer dual power supplies, dual LAN interfaces, multiple processors, and other features to enhance performance and availability. The potential failure of an individual component within a server is thus accommodated with redundancy, which typically implies hardware features but may also include redundant software components, including applications. Extending this strategy, redundancy may also be provided simply by duplicating the servers, with multiple servers running identical applications. The failure of a hardware or software module within a server is accommodated by shifting users from the failed server to one or more servers in a cluster.
Figure 9: ISP configuration using storage networking
The software required to reassign users from one server to another with minimal disruption to applications is very complex. Clustering software written for high-availability implementations may trigger on the failure of a component of the hardware, protocol, or application. The recovery process must preserve user network addressing, login information, current status, open applications, open files, and so on. This is no small task, which may in part account for the delays in embedding clustering into the operating system. Clustering software may also include the ability to load-balance between active servers, so that in addition to fail-over support, the servers in a cluster can be maximized for increasing overall performance.
Small clusters can be deployed with traditional parallel SCSI cabling for shared data but are generally limited to two servers. Fibre Channel allows server clusters to scale to very large shared data configurations, with more than a hundred servers in a single cluster. Whether this is implemented with arbitrated loop or a combination of fabrics and loop depends on the traffic volumes required by user applications.
Since the focus of clustering is to facilitate availability, deploying a server cluster on a SAN typically includes redundant paths from multiple servers to data. Software on each server must monitor the health of hardware components and applications and be able to inform other servers in the cluster if a failure or loss of service occurs. This heartbeat status is normally propagated over a dedicated, and sometimes redundant, LAN interface. If redundant paths to data are provided, each server must also monitor the status of each SAN connection and redirect storage traffic if a loop or switch segment fails. In addition, the data itself may be secured via local or remote RAID mirroring, which provides a duplicate copy if a primary storage unit fails. This tiered strategy helps ensure the availability of servers, access to data, and the data itself.
Figure 10: Campus storage network
In the site represented in Figure 7, a cluster of 10 servers is supported by arbitrated loop in a redundant, shared data scheme. Two 12-port loop hubs are configured as primary and backup paths between the clustered servers and RAID disk arrays. For this installation, the status heartbeat is also configured with redundant Ethernet links between each server, so that the failure of an Ethernet link will not falsely trigger a condition in which each server would attempt to assume services for others. Since the clustering software determines what components or applications on each server should be covered by a failure, subsets of recovery policies can be defined within the 10-server cluster. In this configuration, all servers share a common database application, whereas subsets of three servers are configured for fail-over for specific user applications. The example configuration can also be scaled to accommodate additional servers or storage by either cascading additional hubs on each loop or, depending on bandwidth requirements, subdividing primary and backup loops into smaller segments, using switching hubs or fabrics.
Internet service providers
Internet service providers, or ISPs, that provide Web-hosting services have traditionally implemented servers with internal or SCSI-attached storage. For smaller ISPs, internal or direct-attached disks are sufficient as long as storage requirements do not exceed the capacity of those devices. For larger ISPs hosting multiple sites, storage requirements may exceed SCSI-attached capacity of individual servers. Implementation of network-attached storage (NAS) or SANs are both viable options for supplying additional data storage for these configurations.
In addition to storage needs, maintaining availability of Web services is critical for ISP operations. Because access to a Website-URL, or uniform resource locator-is based on Domain Name System (DNS) rather than physical addressing, it is possible to deploy redundant Web servers as a fail-over strategy. If a primary server fails, another server can assume access responsibility via a round-robin DNS address resolution. For sites that rely on internal or SCSI-attached storage, this implies that each server and its attached storage must maintain a duplicate copy of data. This is a workable solution as long as the data itself is not dynamic, that is, consists primarily of read-only information. It is a less attractive option, however, for e-commerce applications, which must continually update user data, online orders, and inventory tracking information. The shift from read-mostly to more dynamic read/write requirements encourages the separation of storage from individual servers. With NAS or SAN-attached disk arrays, data is more easily mirrored for redundancy and is made available to multiple servers for fail-over operation. As Figure 8 illustrates, NAS provides common data access over shared or switched Ethernet, allowing multiple Web servers to exist in a fail-over configuration.
Figure 11: Fibre Channel-based disaster-recovery implementation
SAN architecture brings additional benefits to ISP configurations by freeing up bandwidth on the provider's LAN segments, providing high-speed data access between servers and storage, and facilitating tape-backup operations. As shown in Figure 9, storage traffic is isolated from the LAN data path, which helps ensure data integrity even if problems occur on the LAN transport. At the same time, read/write operations to disk do not burden the LAN with additional traffic, which allows the LAN to be designed around external access requirements alone. LAN or server-free backup is enabled by SAN-attached tape subsystem and NDMP or Third Party Copy software utilities, which further frees LAN bandwidth for users. Expansion of storage and growth of Web servers are accommodated by extending the SAN with additional fabric switches or loop hubs. This small configuration can scale to hundreds of servers and terabytes of data, with no degradation of service.
Campus storage networks
Server-based applications and storage present several contradictions for IT management of extended networks. Decentralized servers and storage provide the convenience and higher speed of local user access but require higher administrative overhead for maintaining and backing up multiple sites. Centralizing servers and storage to a data center reduces administrative requirements and allows consolidation of server resources but restricts remote users to the bandwidth available via the WAN. Traditionally, centralizing resources has meant provisioning multiple high-speed WAN links to each site just to achieve 1MBps to 5MBps bandwidth, which is often insufficient to supply the response time users demand. Even with high- performance routers and data-compression techniques, the WAN may become a bottleneck for both peer traffic and file retrieval between remote sites and a data center.
Fibre Channel's support of 10km links facilitates the search for a compromise between distributed and centralized data access. Using longwave lasers and multimode cabling, multiple sites in a campus or metropolitan area network can be brought together in an extended SAN. As shown in Figure 10, each building has a local SAN, which, depending on traffic requirements, is based on arbitrated loop or a departmental fabric switch. The local SAN provides high-speed access and storage sharing for the users at each site. By linking remote sites to a central data center via singlemode fiber, servers at each remote location also have access to centralized storage. This configuration also allows each remote site to be backed up to large tape subsystems maintained by the data center. In the example shown, the development building is provisioned with two fiber-optic links. This is to accommodate retrieval of engineering drawing files archived on data center RAIDs. By load-balancing across multiple switch links, an effective throughput of up to 200MBps can be achieved.
This extended SAN helps resolve data-security issues via backup and sharing of centralized storage by multiple remote locations but still requires software to control volume assignment and file locking if data is to be shared among remote servers. Particularly in NT environments, it is essential to administer which storage devices an NT server can access. Common access to a shared tape subsystem likewise requires scheduling software and the ability to alter ownership of the tape resource dynamically such as via zoning on a fabric switch.
Similar to campus SANs, disaster-recovery implementations are leveraging Fibre Channel's support for 10km 100MBps links to provide remote disk mirroring and tape-backup requirements. Using Fibre Channel extenders, it is possible to achieve distances of more than 60km if the disaster-recovery site is more than 10km away. Enterprise networks that invest in disaster recovery will normally deploy additional safeguards, including high-availability server clustering, RAID, and redundant data paths via dual loops or fabric switches.
Figure 11 illustrates a disaster-recovery solution that uses long-wave, singlemode fiber cabling between the production and disaster-recovery sites, with fully redundant data paths for each location. To avoid propagation delays for every transaction, each site is configured with fabric switches instead of arbitrated loop hubs. This provides higher-speed access at the production site, with only disaster recovery-specific traffic traversing the long haul. In the example shown, the primary application at the production site is a relational database. To keep the disaster-recovery site current, only updated records are required, which further reduces the burden on the 6-mile link. Periodic tape backup can be performed against the disaster-recovery disks, which achieves the goal of data security without incurring additional overhead on the production servers. Redundant data paths at each location prevent the failure of a link or a switch from disrupting either production or data-copying applications, whereas dual Fibre Channel connections to the Fibre Channel-to-SCSI bridge ensure a path is always available for the tape subsystem. This configuration could be further optimized, at some expense, by deploying two fibers for each 6-mile link, thus increasing the capacity to 200MBps, if desired.
Tom Clark is director of technical marketing at Nishan Systems. He is also a board member of the Storage Networking Industry Association (SNIA), co-chair of the SNIA Interoperability Committee, and the author of (Addison Wesley Longman).
This article is excerpted with permission from Designing Storage Area Networks, A Practical Reference for Implementing Fibre Channel SANs, by Tom Clark (Addison-Wesley, 0-201-61584-3, copyright 1999, Addison Wesley Longman).