Multi-vendor testing proves that the industry has come a long way in solving interoperability and compatibility challenges.
By Dave Deming
As with any relatively new interface, Fibre Channel poses interoperability challenges among products from different vendors. Over the last few years, however, vendors and third-party organizations have ironed out most of the problems, making it possible to build complex multi-vendor storage area networks (SANs).
This article documents a variety of multi- vendor interoperability experiments conducted by Solution Technology. The tests were endorsed by the Fibre Channel Industry Association (FCIA). The objective of the testing was to explore Fibre Channel device interoperability issues and to demonstrate the progress being made in this critical area. The primary methodology was to construct different Fibre Channel SANs using products from as many vendors as possible.
In the demonstrations, the majority of the hardware was configured without debugging; in some instances, minor code changes were required. This testing showed that while there are still some interoperability issues, the task of integrating different vendors' Fibre Channel products into the same SAN is achievable. Ultimately, this means that OEMs, system integrators, and end users that choose to build Fibre Channel SANs will have a much easier task when it comes to multi-vendor component availability and integration.
Compatibility vs. interoperability
Interoperability and compatibility are sometimes confused. A technology like Fibre Channel can have compatible components, but not have components that are 100% interoperable with each other.
Interoperability ensures that all products have been designed to an industry standard and that they interoperate with all other devices designed to that standard. The Fibre Channel standards are so immense and convoluted that it is not easy to make all products 100% interoperable. If all devices were capable of obeying all the rules in the Fibre Channel standards, you would have interoperability.
Compatibility ensures that similar products such as host bus adapters (HBAs) can be replaced with other vendors' HBAs. If two HBAs are interchangeable, they are compatible in that application. Compatibility can be taken one step further: the ability of similar products from different vendors to interact within the same configuration (e.g., on the same loop).
Most Fibre Channel SANs today are restricted to a relatively small set of qualified components. Because of potential interoperability problems, integrators can't add different vendors' components that haven't been qualified. However, our demonstrations prove that devices from a wide variety of vendors can, in fact, be integrated in the same SAN. With little effort, integrators and end users can combine different vendors' products with relative ease. That said, this study in no way proves or guarantees interoperability among all Fibre Channel products.
What is being done
The Fibre Channel community and FCIA have taken many steps to resolve interoperability issues. There are numerous mechanisms in place that provide manufacturers, integrators, and OEMs with ample opportunities to resolve interoperability issues that may arise during product development. These include product Plugfests; SANMark, a Fibre Channel protocol verification document; and university involvement.
Hosted by the FCIA, Plugfests are typically held three times a year at Interphase Corp.'s facilities in Dallas. University of New Hampshire (UNH) faculty and students organize, lead, and assist in these efforts. The opportunity to participate in these events is vital to the success of a product's ability to interoperate in multi-vendor Fibre Channel SANs. The Plugfests offer a safe, cost-effective, and productive environment to resolve interoperability issues with other Fibre Channel vendors.
A joint effort by FCIA and the NCITS T11 Fibre Channel standards communities, SANMark's objective is to test protocol rules that are vital to the interoperability of Fibre Channel lower (or link) layer protocols. The SANMark document details proper timing settings for certain initialization protocols and defines test procedures that verify addressing issues. Devices that pass these minimum requirements are more likely to interoperate with other devices that pass these tests. SANMark is an important mechanism that can be used by all vendors to eliminate numerous reasons of non-interoperability.
For many years, the University of New Hampshire has taken a leading role in testing Fibre Channel products. UNH is instrumental in defining numerous test scripts and procedures and is currently responsible for facilitating the SANMark effort by working with FCIA to define the SANMark tests. UNH students coordinate all link-layer protocol testing efforts and provide Plugfest attendees with the opportunity to resolve interoperability issues. UNH also provides laboratory facilities and tests products using UNH-developed test suites.
Solving compatibility problems
Device-level compatibility issues can usually be discovered and fixed relatively quickly, assuming a sound knowledge of Fibre Channel protocols. All of the problems we encountered were fixed by changing firmware or device drivers. None of the problems were hardware-related.
In our tests, relatively few vendors had to troubleshoot or change firmware or device-driver code. Of the participating six disk-drive vendors, for example, four had to make changes (two because new prototypes were sent; factory models worked properly). Of the five HBA vendors, one vendor had to make a code change and another had to find the right device driver. The eight switch/hub vendors didn't have to make any changes other than one switch vendor had to configure switch ports for public or private loop operations.
We relied on real applications to test the configurations. This meant each device had to be properly installed and formatted, thus making it available to the NT operating system. Installation included loading device drivers for HBAs and any drivers associated with special hardware such as tape libraries and RAID controllers.
After the hardware was installed and configured, we copied AVI files onto the devices and ran the AVI clips off each device. In some instances, this included multi-host access to the same files. Backups were performed using Microsoft backup and Veritas Backup Exec software.
We assembled many SAN configurations, ranging from simple single-host configurations to complex switched fabric environments. The list below summarizes the configurations that were tested:
- A single host connected through hubs to a shared storage enclosure. This SAN is the most basic configuration.
- Multiple hosts connected to a shared storage enclosure. This demonstrates a multi-client SAN where all users can easily connect and share the same storage.
- Dual hosts connected to shared storage using different paths. This is accomplished using two different communication paths to the same dual-loop enclosure. This SAN demonstrates a high-availability application that can be easily scaled. A tape library was added to demonstrate LAN-free backup.
- Single and multiple hosts in a switched fabric, with LAN-free backup. Multiple hosts were attached to four vendors' switches, which were connected to different storage devices. Although complex, this SAN demonstrates the scalability of a switch-based SAN.
- Multiple hosts, segmented hub, switched fabric, and LAN-free backup. Multiple hosts were attached to a segmented hub, which was connected to four vendors' switches connected to different storage devices. This included various public and private devices with a remote library capable of LAN-free backup. This SAN was the most complex, covering all aspects of a large, heterogeneous SAN.
Plugability: the ability to interchange like components. These tests verified the ability to replace a SAN component with one that performs identical functions. For instance, we configured a SAN with a QLogic HBA and replaced it with Agilent, Emulex, Interphase, and JNI HBAs, and then verified proper system operation.
Compatibility: the ability to properly interact with other Fibre Channel vendors' products. This test verifies the ability for multiple vendors' products to interact within the same SAN. For instance, we configured a SAN with multiple HBAs from different vendors on the same loop, and configured a SAN with all the different vendors' disk drives in the same enclosure.
Overall results * In all of our experiments, after the proper firmware and driver changes were made, 100% of the HBAs, disk drives, switches, and hubs were interchangeable (i.e., any vendors' device could be replaced to perform the required tasks).
In addition, all HBAs were capable of properly interacting on the same loop and all of the disk drives tested not only worked on the same loop, but were also compatible with every enclosure we tested; a variety of hub and enclosure combinations were assembled with no problems; and all cables and GBICs for both copper and optical were interchangeable.
- Disk drives: Fujitsu, Hitachi, IBM, Quantum, Seagate (2 models), Western Digital
- HBAs: Emulex (2 models), HP, Interphase, JNI, QLogic (2 models)
- Hubs: Atto, Emulex, Gadzoox, Vixel
- Switches: Ancor, Brocade, Gadzoox, Vixel
- GBICs: Cielo, Finisar, Fujikura America, IBM, Vixel
- Cables: AMP, Amphenol, Berg, Gore, Molex
- Enclosures: Consan, Eurologic, JMR, nStor (2 models), Xyratex
- Tape libraries: Quantum/ATL
- Bridges: Atto, Chaparral
- RAID controllers: ICP Vortex
- SES devices: Vitesse
- Analyzers: Ancot, Finisar, Xyratex
- Servers: Dell (donated by Agilent) and Dolch (donated by Finisar)
- Total: 33 vendors, 52 different products
Each HBA was installed and tested to verify that it could recognize devices and access AVI files loaded by other HBAs. A hub plugability test included switching one vendor's hub for another until all hubs were verified. GBIC and cable plugability tests included swapping out different copper cables, optical cables, and GBIC combinations, and verifying proper operation.
We connected additional hosts to the Test #1 configuration. We installed numerous HBAs into the same workstation to simulate a multi-host environment. Though this may not typically happen in a normal SAN design, the outcome would be the same. We tested the configuration by playing the AVI files from each device-HBA combination. This included 18 video feeds on the Dell workstation and 6 video feeds on the Dolch computer. The AVI clips ran for two days without any interruption or errors.
We also connected additional storage to the SAN via an Atto Fibre Channel-to-SCSI bridge. Bridges allow users' to capitalize on current SCSI investments by enabling them to connect any SCSI device (tape libraries, disk arrays, JBOD enclosures, etc.) to a Fibre Channel SAN.
We then connected a Vitesse System Enclosure Services (SES) devices to the SAN. SES devices are used to monitor enclosure components and conditions such as power supplies, fans, and temperature.
To demonstrate hub compatibility, numerous different hubs were cascaded. However, typical SAN designs may not cascade hubs because system performance may be affected due to the loop architecture.
This configuration also demonstrated multi-generation HBA compatibility on the same loop. In this SAN, there were two generations of HBAs: first-generation Emulex LP6000 and QLogic QLA2100 HBAs and second-generation Emulex LP850 and QLogic QLA2200 HBAs.
The main objective of this experiment was to see how many different HBA and disk-drive vendors' products could be integrated on the same loop. It took a little work to accomplish the task. Four disk-drive vendors had to change firmware or troubleshoot, and one HBA vendor had to modify a device driver. Once those issues were resolved and the proper patches were in place, the experiment demonstrated the ultimate in multi-vendor compatibility within the same SAN.
This SAN includes HBAs from five vendors and disk drives from six vendors, representing the majority of suppliers. HBAs and disk drives are the backbones of a SAN, so compatibility among these devices is essential.
All four hubs were verified, and a variety of optical/copper cables and GBICs were used throughout the SAN. It was our objective to use as many vendors' components as possible.
By completing previous experiments this high-availability SAN was the easiest to put together. We simply had to reconfigure hardware into a multi-vendor dual-loop configuration and added remote LAN-free backup capability as an extra dimension to the application.
This configuration is significant because of its fault-tolerant design, which is easy to install, maintain, and manage. Any component could fail in either path (i.e., HBA, hub, cable, etc.), and the data could still be accessed.
However, there were two missing links in this experiment:
- Middleware to handle a high-availability application and multi-host access to the same shared storage. In an NT environment, middleware software is required for a host to change or add a file to the same storage device and to make that file available to other hosts without file corruption. This is a common problem in multi-host environments.
- Hardware link between hosts, which would be used to mirror (synchronize) the host memory to support server failover.
The ICP Vortex RAID controller in this configuration includes dual ports; only one port was used and was attached to the Eurologic storage enclosure with five different disk-drive vendors configured in a RAID-5 stripe. The Chaparral Fibre Channel-to-SCSI bridge was attached to a SCSI tape drive to perform LAN-free backup from either server.
We ran AVI video clips from all hosts, simulating three clients accessing a shared data mart. This experiment verifies that the disk drives can handle multi-host activity (multi-initiator/clients) and can be accessed on both loops simultaneously. Also, while the AVI clips were playing, LAN-free backup was performed every 30 minutes.
A dual-loop hot-plug experiment was run to determine if loop initialization on one loop affected the application running on the other loop. This was done by running AVI clips from all devices from one host while the other loop initialized with another host. There was no noticeable interruption of video clips on the original host (loop A) when the other loop (loop B) was initialized.
Though not suitable for standard production environments, the goal of this configuration was to demonstrate the considerable progress on the part of Fibre Channel vendors toward creating a plug-and-play environment.
The Dell server was configured with four PCI-based HBAs. Fabric-ready drivers for the HBAs were loaded to support the Windows NT 4.0 Service Pack 5 operating system. Vixel's 2100 Fibre Channel hub is zoned to provide four loops and therefore four independent paths to either a switch/storage or to the tape library. The Vixel hub was zoned as follows:
- Ports 1 & 2 = zone1
- Ports 3 & 4 = zone 2
- Ports 5 & 6 = zone 3
- Ports 7 & 8 = zone 4
LIPs could occur within a zone, but not between zones. And due to the effect of zoning, each HBA could only see its storage if viewed from SCSI drivers in the NT Control Panel. Of course, the disk administrator could see all storage devices since it has a view of the four HBAs.
All fabric-ready targets logged into the SNSs (simple name server) of the Fibre Channel switched properly, and the switches could be moved to any switch position without affecting SAN performance/operation.
Zone 1. Port 1 was connected to an Agilent 5121 HBA, port 2 to a Vixel 8100 eight-port Fibre Chanel switch. The Vixel 8100 was configured for fabric mode; any ports on the switch could be used for this configuration. The Atto fabric-ready bridge was connected to the Vixel 8100, and a 9GB Quantum Atlas SCSI drive was connected to the Atto bridge. Several AVI videos were loaded on the drive.
Zone 2. Port 3 was connected to an Emulex LP8000 HBA, port 4 to an Ancor MKII 16-port Fibre Channel switch. The Ancor switch auto-detects and configures ports for fabric-ready or private-loop connections. In this configuration, the Ancor switch was connected to a Xyratex JBOD enclosure with nine fabric-ready drives. Again, several AVIs were loaded on each of the nine drives.
Zone 3. Port 5 was connected to a JNI 116x HBA, port 6 to a Brocade Silkworm 2400 16-port Fibre Channel switch. The Brocade switch auto detects and configures ports for either a fabric-ready connection or a private loop connection. In this configuration, the Brocade switch was connected to a Consan RAID controller and a Fibre Channel controller that supports only private loop. Again, several AVIs were loaded on each of the nine drives.
Zone 4. Port 7 was connected to a QLogic QLA2200 HBA, port 8 to a Gadzoox Capellex switch. The Gadzoox switch was connected to a Quantum/ATL tape library running a private loop.
Testing was accomplished by loading Veritas Executive Backup for NT on the Dell server. We then started the AVIs on each of the storage devices and performed a complete LAN-free backup of all AVIs on all storage devices to the Quantum ATL library every half hour. While the backup was running, GBICs were pulled on the 2100 switch in zones that weren't being backed up at the time, and although the AVIs stopped until the GBICs were reinserted, the tape backup operation was not interrupted.
After the tape backup was completed, several reboots of the Dell system were exercised to ensure that all storage was available. Disk Administrator was consistent in this application and all previously assigned drive letters held through the reboots. All HBAs exhibited consistent behavior.
This test showed that it is possible to mix a wide variety of vendors' products and get consistent results. This experiment also demonstrated the adaptability of the switches to accommodate to public devices (which is expected) but also to automatically switch to some type of special mode (e.g., stealth mode) to allow access of private devices not capable of fabric protocol.
We also performed hot-plugging experiments, adding devices to already functional and running configurations. We were surprised by how well the current set of devices and drivers handled hot-plug situations.
Whenever a new device is added to an arbitrated loop, a LIP occurs-a problem that has plagued the FC-AL architecture since its inception. When a LIP occurs, all application activity is suspended until initialization is completed-similar to resetting a SCSI bus during application activity.
In one hot-plug experiment, we added a new device to an operating loop while video clips were playing from disk drives. We monitored the bus and verified that the initialization procedure occurred, that the new device was added to the topology map and then logged into by the adapters.
Depending on a number factors-mainly the system and HBAs we used-three different responses were noticed (in the order of desirability).
- The AVI clips continued with a minor, but noticeable blip in the video (sometimes no effect on video at all).
- The AVI clips stopped or played until the end and required a start command to continue.
- NT notified us that the device was off-line. After clicking "OK" numerous times, the device eventually became available and the video clips continued once the start button was pressed. At no time, did we notice any data-file corruption.
Another hot-plug experiment involved playing AVI clips and removing the storage from one path and hot plugging into an alternate path. This was accomplished in several ways, e.g., by cascading two hubs and hot-plugging storage from one hub to the other or by completely moving the HBA and storage connection from one hub to the other. In this experiment, we typically had to restart the video clips.
While hot plugging may be a requirement of some applications, operating systems will require some type of notification that the configuration has changed and additional resources are available. In NT, Disk Administrator must be executed, and if the device had not been previously formatted, then that function would also have to be executed.
While these issues do not affect the Fibre Channel architecture, they need to be addressed to accommodate dynamic configuration changes. Middleware vendors, not operating system vendors, will most likely resolve these issues.
David A. Deming is the president and chief technical officer of Solution Technology (www.soltechnology.com), a leading provider of high technology education. He has more than 13 years experience with I/O interfaces, including Fibre Channel, SCSI, SSA, ATA/IDE, and P1394. He has been involved in testing Fibre Channel components since 1994. Solution Technology is a member of the FCIA, and Mr. Deming is a voting member of the NCITS T11 Fibre Channel standards committee. He has also been involved with and coordinated application-level testing at the FCIA-sponsored Plugfests since 1998 and he is responsible for maintaining the Interoperability Center for the FCIA.