How to go from boxed HBAs and switches to a fully functional, fault-tolerant SAN in 30 minutes.
By Jack Fegreus
At any small to medium-sized business (SMB), the search for a high-throughput storage solution will very likely begin with the use of dedicated RAID arrays using Ultra320 SCSI technology. In InfoStor Labs' tests of Ultra320-based subsystems in 32-bit Xeon-based servers, we saw throughput levels of 150MBps to 175MBps using internal four-drive RAID arrays. However, the strict cabling requirements needed to maintain proper signaling strength place hard constraints on the ability to grow these internal arrays.
The purpose of a storage area network (SAN) is to share physical storage by distributing logical devices to multiple systems. The key to taking advantage of disk sharing within a SAN is the ability to create high-capacity disk arrays.
SAN storage systems are usually based on Serial ATA (SATA) or Fibre Channel Arbitrated Loop (FC-AL) internal drive interconnects. Externally, each storage system is connected via Fibre Channel to a port on a SAN switch.
SATA-based storage devices can provide a spectacular 300% price-performance edge over FC-AL devices. Their limitation, compared to FC-AL systems, is a restriction of 12 drives per device. Nonetheless, with 250GB SATA drives readily available, that 12-drive limitation still translates into a rather hefty capacity of 3TB.
For large sites, a 3TB limit per storage system is a very real constraint, which is tied to the issue of the availability of data ports on switches. With a storage system having a FC-AL architecture, it is theoretically possible to configure as many as 64 arrays, each having 16 drives. How well the controllers in such a maxed-out configuration would be able to handle the potential I/O is another matter.
Nonetheless, from a financial perspective, the answer is quite clear: The solution to reducing storage costs is to share storage devices. Unfortunately, the costs associated with the configuration and maintenance of a traditional SAN topology have made it very difficult for a minimally staffed IT organization to realize any operational savings.
To this end, QLogic has introduced the SANbox 5200 stackable Fibre Channel switch. In essence, QLogic has extended the idea of stackable Ethernet switches into the world of Fibre Channel SAN fabrics by adding 10Gbps ports to link SANbox 5200 switches. As Ethernet speeds went from 10- to 100- and then 1,000Mbps, the new high-speed ports were first used to link hubs and switches in order to increase the number of LAN access points with minimal additional overhead. Along this vein, QLogic's switches have four 10Gbps ports to complement the 16 2Gbps ports on each SANbox 5200 switch.
To simplify installation and maintenance of the SAN fabric, QLogic has forgone the traditional approach switch-based software. Instead, the company includes SANbox Manager, a host-based software package that runs on Windows, Linux, or Solaris. By having SANbox Manager be host-based, QLogic was able to make the software much more robust and to add a number of wizard-based modules to significantly ease configuration.
Simplification has also extended to the purchasing process. There is no special licensing for switch features. All licensing is based on the number of ports that are activated.
Once a port is activated, all features are available. The licenses activate eight ports, 12 ports, 16 ports, and 16 ports plus the four 10Gbps ports for inter-switch links (ISLs).
What's more, the 10Gbps ports are really built on four 3.1875Gbps channels over which data is striped. For the SAN administrator—especially a newly designated one—this provides a simple mechanism for creating configuration-free ISLs.
To test the SANbox 5200 stackable switches, we started with the same fabric configuration that we used in our initial SMB SAN configuration (see part 1 of the series, InfoStor, May 2004, p. 34); however, we substituted two QLogic SANbox 5200 switches for the two Brocade 3200 switches. In addition, we used QLogic QLA2340 host bus adapters (HBAs) in place of the Emulex LightPulse 9802 HBAs in the PCI-X slots of the HP ProLiant ML 350 G3 and Appro 2400Xi servers.
As in the previous test, shared disk storage was handled by an nStor 4500F Series storage system. The 4500F was populated with four IBM and four Seagate Fibre Channel drives, which were formatted as two independent RAID-5 arrays. For robust performance and to eliminate the possibility that a disk controller would be a single point of failure, the nStor 4500F array was set up with two RAID controllers in an active-active configuration.
On the two Intel Xeon-based servers used to test performance, we ran Windows Server 2003 on the HP ProLiant ML 350 G3 and SUSE Linux Professional 9.0 on the Appro 2400Xi. Once again, the business constraint for our SMB SAN test scenario was the need to provide guaranteed 24x7 access to all systems.
To complement the inherent fault tolerance provided by the active-active dual-controller configuration of the storage system, we implemented a dual-switch topology on our SAN fabric to prevent the switch from being a single point of failure. Given the number of devices being networked into our SMB SAN environment, a single SANbox 5200 would have been more than sufficient to support the current structure and leave plenty of room for future expansion.
To this end, QLogic has introduced the SAN Connectivity Kit 3000, a bundled SAN that includes one SANbox 5200 licensed for eight ports, four QLA 2340 HBAs, four fiber-optic cables, eight small-form-factor pluggable (SFP) transceivers, SANbox Manager software for the switch, and SANsurfer software for the HBAs. The cost of this "SAN in a box" is $6,999.
Figure 2: The Linux kernel attempts to bundle I/O requests into 128KB blocks. As a result, when nStor's StorView software was used to monitor our oblDisk benchmark (running on an Appro server under SuSE Linux) with the Emulex LightPulse 9802 HBA installed, nearly all of the read-and-write requests were 128KB in size. With Windows Server 2003, I/O requests were spread over a range from 4KB to 64KB. When we ran oblDisk on SuSE Linux with the QLogic QLA 2340 HBA installed, the QLogic driver further bundled requests when possible into 512KB blocks. As a result, I/O requests at the nStor array were spread over 128KB, 256KB, and 512KB block sizes.The working assumption for our evaluation of the QLogic-based SAN was that all of the servers and the storage system would be in place and in working order prior to the arrival of the HBAs and switches from QLogic. In particular, the operating systems were installed and running on the servers, and the RAID arrays on the shared storage system were initialized.
Given the constraints on capital and operating costs for our test SMB SAN, a key evaluation criterion was how quickly we could get our initial two-switch SAN fabric up and running. Remarkably, we were able to go from boxed HBA and switch equipment to a fully functional, fault-tolerant SAN in just 30 minutes.
Upon opening the box containing a QLogic QLA234x HBA, system administrators will discover a card included with the driver installation CD with instructions to check the QLogic Website for the latest drivers. In all likelihood, such a visit will be unnecessary because the QLogic drivers are included in all of the latest operating system distributions.
Both Windows Server 2003 and SuSE Linux Professional recognized the new hardware and immediately installed the proper drivers. More importantly, on the Linux server the QLogic driver was recognized as a disk controller and not simply as a Fibre Channel interface. As a result, the drivers were automatically added to the initrd ram disk, which is not the case with all Fibre Channel HBAs.
Once the Fibre Channel HBAs are installed on the servers, the next step is to configure the switches and create a SAN fabric. The SANbox 5200 switch box includes a CD containing versions of SANbox Manager for Windows, Linux, and Solaris. Also in the box is a sheet of paper entitled "Quick Start Guide."
For anyone who has previously installed a SAN switch, the brevity of an eight-step "Quick Start Guide" will likely cause some serious consternation. Starting with Step 5, "Apply Power to the Switch," moving to "Install SANbox Manager," and then "Run the Configuration Wizard—after first connecting the switch to the Ethernet," we found the lack of documentation disconcerting to say the least. There was nary a word about configuring a serial port through which to launch Telnet and run a complex menu-driven configuration program in a terminal window.
With trepidation, we proceeded to follow the instructions in the "Quick Start Guide." Not surprisingly, the installation of SANbox Manager on Windows Server 2003 was a trivial and near-instantaneous task. Quite surprisingly, however, the installation of SANbox Manager on SuSE Linux was an equally instantaneous and trivial task. Marking the time, we launched SANbox Manager on our SuSE Linux server.
Immediately upon launching the wizard associated with configuring a new switch, we noticed that QLogic, like network-attached storage (NAS) appliance vendors, has concentrated on solving a lot of the annoying problems that plague systems/ storage administrators. Thanks to some installation wizardry, there is no need for serial connections or the specialized cables that such connections can require. An administrator simply plugs an Ethernet patch cord into the SANbox 5200 switch and allows a module that discovers a network device to go to work.
The SANbox Manager wizard begins by asking for a temporary IP address that it will assign the new switch during the configuration process. Once that address is entered, the wizard then requests that the administrator recycle power on the target switch. The wizard then quickly discovers the power-cycled switch and puts it on the LAN using the given temporary address.
With LAN communications established, what follows is a simple series of configuration options for the switch:
- A SAN domain ID number;
- A SAN symbolic name;
- A permanent means for LAN address configuration;
- The date and time for the switch; and
- A new password for the administrator.
That's it! Within ten minutes of starting the process, we had configured our first switch and were ready to commit our first fabric drive.
Normally, adding a second switch to a SAN fabric is a more complicated task than installing and configuring the first switch. An administrator must first decide how many ports on each switch will be dedicated to the creation of an inter-switch link. Since each port on a switch is independently capable of delivering data at 2Gbps to or from a device, two questions must be answered:
- How many ports on each switch might be communicating with ports on the other switch at the same time?
- How fast are the devices connected to each port?
Alternatively, a good rule of thumb is to dedicate four ports on each 16-port switch to provide a bandwidth of 8Gbps for inter-switch traffic. This provides a healthy 3-to-1 subscription ratio—the number of data ports vs. ISL ports. Nonetheless, for a two-switch fabric this leaves only 24 ports available to connect devices.
Once the decision is made concerning the number of ports to divert, an administrator must configure each of the ports on both switches to handle their new job. For the switch to load balance across the independent connections, the corresponding ports on each switch must be logically linked into a single trunk.
Simply diverting a group of ports to inter-switch traffic rather than to the task of connecting devices, however, does nothing to make those circuits act as a single high-bandwidth logical connection. There is nothing to stop most of the traffic from moving down one of the port-to-port connections and creating a bottleneck. To prevent that from happening, additional trunking firmware must be enabled on the switch to provide load balancing over the group of adjacent ports designated to form the single logical trunk. Unfortunately, load balancing in this fashion adds overhead to switch processing. This added overhead is not necessary with 10Gbps ports, which are designed around four channels that are automatically balanced.
Adding another switch into a fabric built on QLogic's SANbox 5200 stackable switches is just as easy as configuring the first switch. Each switch comes with four 10Gbps load-balanced ISL ports. By simply interconnecting two ISL ports on each switch with the appropriate copper cables, our test SMB SAN fabric had a completely configured redundant trunk without losing a single data port.
We were then set to run the wizard that would add the second switch and complete the fabric. Just 30 minutes from starting with the installation of the QLogic HBAs, we were ready to start configuring SAN-based disks on our Windows and Linux servers.
Getting a SAN fabric up and running is only a starting point. Equally important is the ease with which a SAN can be monitored and maintained. Once again, the host-based SANbox Manager software outpaces the capabilities of switch-based tools. In fact, for advanced performance monitoring, SANbox Manager provided critical information that was not available from switch-based tools. What's more, sampling of data could be set to much finer granularity.
Robust monitoring and analysis tools are especially useful for maximizing RAID performance: Remember that each port on a 2Gbps SAN switch is capable of delivering a full 2Gb of data per second. That translates into about 200MBps to 220MBps. Using our oblDisk benchmark on an Intel Xeon-based sever, we measured higher I/O throughput over the SAN than when using a local Ultra320 SCSI disk array. As a result, we were able to saturate a 2Gbps port with little difficulty. Taking account of this fact is an essential step on the way to properly designing a SAN fabric that will scale.
While the QLogic 5200 switches are completely operating system-neutral, achieving a consistently high level of throughput from a single 32-bit server is distinctly bolstered using a QLogic QLA 2340 HBA on a server running Linux. The Linux kernel attempts to bundle I/O requests into 128KB blocks. The QLogic Fibre Channel HBA driver for Linux capitalizes on this behavior by further bundling the 128KB Linux requests in order to deliver very large 512KB data requests to the storage subsystem.
A different picture arises when the traffic profile switches from large sequential data transfers to a simulation of a transaction-processing environment. In this scenario, the server issues I/O requests that are typically 8KB in size with a significant portion of the requests clustered around index tables. This is the scenario modeled by our oblLoad benchmark. While running oblLoad the server was able to successfully process more than 2,000 I/O requests per second while maintaining an average response time of less than 100ms. Clearly this transaction load stressed the server and disk array—but what about the fabric?
In classical terms of data throughput (i.e., MBps) the transaction-processing load benchmark had not come close to stressing the fabric. Expressed in terms of data throughput, the fabric was operating on somewhat less than a 10% load.
While data throughput is the natural measure for a system administrator to focus on, that metric ignores the fact that the natural metric in a SAN is the frame rate. That's because the real work of a SAN switch is to handle frames. During an I/O request on a SAN, data must be segmented into optimally sized frames, encoded, and then sent across the fabric. Frames can be as small as 60 bytes and as large as 2,148 bytes. As the frames reach their destination, they are unpacked and decoded.
As a result, there is a world of difference between transferring a megabyte of data in 500 large frames and transferring a megabyte of data in 17,000 small frames. The 2Gbps specification per port limits peak perfor-mance; however, the adaptability of a SAN to bundle data into a few large frames makes it relatively easy to reach this level of performance without stressing the fabric.
While running the oblLoad benchmark, the average frame carried about half the data that was packaged into frames during our oblDisk benchmark. As a result, frame traffic was about 50% greater than what we would have predicated based on the behavior of oblDisk. As I/O requests at the operating system level become smaller and more difficult to bundle, the SAN fabric is forced to handle more and more small frames of data. The bottom line for a Fibre Channel switch comes down to how well it handles traffic when flooded with small frames of data.
Our benchmark tests demonstrate the ability of Fibre Channel devices to work within a fabric to adapt to the current conditions and optimize the flow of data. This dynamic environment makes it all the more important to provide sufficient head room for traffic that will be traveling across switch boundaries.
This explains the importance of the behavior of the QLogic HBA in a system running Linux and the positive ramifications for overall SAN performance. First, the I/O requests arrive at the switch in very large blocks, which allows the switch to use large frames. This puts minimal stress on the switch and potentially on an ISL if a switch hop is required. Second, the data arriving at the storage server appears as if it were coming from a high-end Unix server. As a result, the RAID array can be configured with large 256KB chunks. This minimizes the possibility that a chunk will be split across drives because of alignment issues and maximizes the probability that a write operation will consume a full stripe on the array. So not only is the switch operating at full efficiency; the storage systemis, too.
As a result, the scalability of complex multi-switch fabrics using anything other than stackable switches becomes a precarious balancing act. First, enough ports must be taken out of service as device ports and dedicated to function as ISLs. This task is further complicated by the requirement to keep all of the ports dedicated to function as a particular ISL in a contiguous block in order to provide load balancing over the group. Nonetheless, even if the distribution of ports between data and ISL functionality is done perfectly, the scalability of the fabric will still be a function of the frame sizes that characterize the traffic on the fabric and not the amount of data traveling on the fabric.
Using the QLogic SANbox 5200 stackable switches, these issues are no longer a problem. First, there is no need to allocate any of the 16 data ports to function as ISLs. In addition to the 16 2Gbps data ports, there are four 10Gbps ports on the switch to perform this function. These ports actually support a throughput rate of 12Gbps as they are designed around four 3.1875Gbps channels. Every 4-byte Fibre Channel word is "striped" across the four channels at one byte per channel. This automatically provides load balancing.
Given the throughput specification of these ports, a simple two-switch fabric will have its full collection of 32 ports available for device connections. By employing two ISL ports—each of which is the equivalent of six ordinary data ports—for a backbone, the switches will have highly effective 2.7-to-1 subscription ratios.
Jack Fegreus is technology director at Strategic Communications (www.stratcomm.com). He can be reached at JFegreus@StratComm.info.
InfoStor lab scenario
2Gbps SAN switches and HBAs
What we tested
Two QLogic SANbox 5200 stackable switches
- 16 2Gbps ports
- Four 10Gbps ISL ports
- SAN Manager host-based configuration and monitoring software
Three QLogic QLA 2340 HBAs
- Full duplex, 2Gbps Fibre Channel
- Automatic topology configuration
- Automatic speed negotiation
- 133/100/66MHz PCI-X and PCI compatibility
How we tested
nStor 4520 Storage System
- Two WahooXP RAID controllers
- 1GB cache
- Dual Fibre Channel ports
- Active-active configuration
- RAID-10 and RAID-50 arrays
- Expandable capacity LUNs
nStor StorView management software
- Host-based HTML software
- Automatic discovery of server HBAs
- Comprehensive performance monitoring
Four Hitachi GST UltraStar disk drives
- 2Gbps FC-AL
Four Seagate Cheetah disk drives
- 2Gbps FC-AL
HP ProLiant ML350 G3 server
- Dual 2.4GHz Intel Xeon CPUs
- 1GB PC2100 DDR memory
- Four 100MHz PCI-X expansion slots
Appro 1224Xi 1U server
- Dual 2.4GHz Intel Xeon CPUs
- 1GB PC2100 DDR memory
- 133MHz PCI-X expansion slot
Dell PowerEdge 2400 server
- 800MHz Intel PIII CPU
- 512MB SDRAM memory
- Four 66MHz PCI expansion slots
SuSE Linux 9.0 Professional
- Linux Kernel 2.4.21
Windows Server 2003
- NET Framework 1.1
- oblLoad v2.0
- oblDisk v2.0
- oblFileLoad v1.0
- Switch configured with four 10Gbps ports for inter-switch links
- Each 10Gbps ISL port equivalent to six load-balanced 2Gbps data ports
- No issues of port allocation or trunking
- Wizard-based configuration
- All necessary drivers included in both SuSE Linux and Windows Server 2003
- Complete SAN setup in 30 minutes