Storage area networks demand connectivity options, high availability for shared environments, and storage resource management software.
BY MICHAEL P. KOCLANES
With the emerging market for storage area networks (SANs) there has been a great deal of attention paid to the enabling network technologies, such as host bus adapters (HBAs), switches, hubs, and bridges/routers. Many of the early challenges have focused on resolving the interoperability issues between these new components and the target storage devices such as disk subsystems and tape libraries.
With the networking hardware in place, data-management software vendors have made the necessary changes to dynamically allocate devices and media based on server requests for the Fibre Channel-attached shared storage. In 1999, data-management software vendors succeeded in making this a reality. Software such as Computer Associates' Enterprise Library Option, Legato's SmartMedia, Veritas' Shared Storage Option, and various solutions from IBM's Tivoli unit now enable dynamic allocation.
And seamless interoperability is well on the way to reality. Many vendors are now providing matrices of compatible SAN components with their products, and industry standards are being defined by consortia such as the Storage Networking Industry Association (SNIA) and the FibreAlliance group.
However, despite this progress, there are still many failed attempts at bringing SANs into production for data management and tape backup. Performance is often less than anticipated, and the tools to ensure high availability are often lacking. Potential end users should demand proof of integrated solutions and testing of all SAN components and software.
Implementing SANs for shared backup and recovery requires a different approach to storage resource management and the architecture of networked devices such as tape libraries. This article defines some of these new requirements.
First, consider the impact of a SAN on the storage device-in this case, a tape library. The target device must function as part of a network system and become a well-behaved network citizen. No longer can a tape library or disk array function just as a captive device on a dedicated bus.
A SAN configuration changes the design requirements for network storage devices. First of all, the device must support the connectivity interfaces and protocols of the networks and/or busses to which it is attached. For example, a tape library may have to support SCSI, Fibre Channel, and Internet Protocol (IP) network connections. For data, the path will be SCSI in a server-attached environment and will be Fibre Channel in an arbitrated loop, switched fabric, or point-to-point Fibre Channel configuration. For control, an IP connection may be required. IP is an excellent "out-of-band" control path, in part because control information tends to be in smaller message packets than the streaming backup data.
For device management, the control path should not interfere with the data path. It should be "out of band" from the "in-band" data path. In contrast, sensing status through the SCSI connection means that library status and drive status are unknown when a server has the SCSI bus reserved for backup, recovery, or data access. Since network-management protocols such as SNMP are well-established for network resources in LANs and WANs, SNMP is a logical choice for SAN resource management.
Large IP networks of routers, hubs, servers, and printers are managed across IP networks with SNMP. CA Unicenter, HP OpenView, Tivoli, and BMC Patrol are examples of such tools in the LAN/WAN management arena. The same methodology can be applied to SAN-based storage resource management. This requires an IP stack on the tape library controller, a definition of a Management Information Base (MIB), and firmware on the controller that senses conditions in the library and drives.
In addition to the requirements for network connectivity, SANs require other design changes in devices such as tape libraries. For example, sharing libraries across many servers requires high-availability features. In a SAN environment, the impact of library failure goes well beyond the loss of storage protection for a single server. A high-availability design should incorporate redundancy in drives, power supplies, and network interfaces.
In addition, tape libraries should be field-upgradable. Administrators should be able to add drives and media without taking the library offline and without impacting currently operating drives and backup/recovery or data-access operations.
Finally, other high-availability features such as RAIT, which is analogous to RAID in disk subsystems, can be implemented. By striping data to four tapes in parallel, and with one tape for parity, a failure of any individual drive or piece of media on recovery will not impact the ability to restore data.
Some libraries allow the ability to add RAIT controller cards as an option and store media in 4+1 RAIT-ready magazines. A single RAIT set can be easily exported and imported. The combination of effective storage resource management and a pair of libraries with mirrored sets of RAIT media and dual data paths can achieve 99.9999% data availability.
However, RAIT hardware capability is in some cases not supported in the backup/recovery software. Although the tape libraries may support the parallel streaming and parity of RAIT, the backup application may be unaware of the RAIT set and will not reflect this media in the catalog. However, as SANs and high availability become more critical, third-party software will support these features.
Another SAN requirement is the need for storage resource management software to aid in the implementation, tuning, and proactive management of this more- complex environment. Assuming a tape library were designed for "out-of-band" management, the other elements of the SAN (e.g., bridges, hubs, and switches) should also have an IP connection and MIB for "out-of-band" management.
Three levels of management-the first of which is device-level management-are needed in this environment. The operations, configuration, and status of the individual elements of the SAN should be accessible through remote management consoles. The distance advantages of Fibre Channel are largely lost if the only way to operate and manage a library or switch is from its front console or a local serial connection. One of the key end-user benefits of Fibre Channel versus SCSI is related to the connectivity distance, number of devices, and flexibility that is enabled without sacrificing bandwidth and performance.
Furthermore, the cost of systems administration of backup exceeds the cost of the software and hardware to support the backup-and-recovery process. Studies show that 76% of the surveyed end users want to manage SANs remotely, using browser-enabled tools. Browser-enabled monitors are available from most SAN device manufacturers.
At the next level of management is a resource management framework for overall SAN management. Trying to monitor multiple browser windows for all of the elements of a switched Fibre Channel environment is awkward, at best. The resource manager should automatically discover the elements of the SAN and display all the devices as icons from a single monitor view. The resource management architecture should have the ability to poll the status and/or receive alerts from the devices. In order to ensure 24/7 management of the SAN, these processes should be independent of the browser-based monitors. The status of the devices is constantly monitored, whether or not the browser-based device monitor is active. Upon detection of a problem, the resource manager should put this into an event-logging database.
In addition, the resource manager should highlight a failing component or warning from the master console. The user can then click on the highlighted icon, and the individual device monitors are launched from the resource manager screen. It is important that the resource manager include a database for gathering trend statistics on the health of SAN resources.
Trend analysis is also important. For example, a marginal power supply may be detected and replaced before it impacts a backup or recovery. The resource manager should flag the marginal output. Without these tools, the problem may go unnoticed until a hard failure occurs.
With proper proactive resource management, the problem is detected before hard failure, and the failing component is replaced with no impact to data availability. Furthermore, the resource manager should be flexible in its ability to deliver alerts to higher-level system tools such as Unicenter, OpenView, or Tivoli. Also, alerts should be sent through e-mail or pager as defined by the end user.
Finally, the resource manager should be able to perform more-sophisticated storage resource management, bandwidth management, load balancing, capacity planning, performance tuning, asset management, and automatic fail-over. As switched fabric networks grow in complexity, dynamic routing and load balancing will also become increasingly important.
In the selection and implementation of a SAN for shared tape backup, there are a number of elements to consider. First, interoperability of SAN components is a critical issue. The integrator or manufacturer should demonstrate tested end-to-end configurations of all necessary hardware and software components. Second, you should consider the design elements of the tape library for shared tape backup applications, including connectivity to necessary network protocols, "out-of-band" management, a high-availability design, and scalability with minimal impact on availability.
Finally, having the proper hardware components only ensures you have the right plumbing in place. Storage resource management and device management enable the benefits of centralized management of SAN resources, lowering the total cost of ownership and gaining the benefits of the connectivity distance enabled by Fibre Channel.
Michael Koclanes is president and CEO of Creek Path, a division of Exabyte Corp. (www.exabyte.com), in Boulder, CO.
Checklist: Tape libraries for SANs
- Connectivity (SCSI, Fibre Channel, IP)?
- High availability (redundant drives, power supplies, network interfaces)?
- Storage resource management software (device level, framework level, bandwidth management, load balancing, capacity planning, performance tuning, asset management, automatic fail-over)?