Building a SAN for SMBs, part I

Should small to medium-sized businesses (SMBs) move to a SAN fabric to satisfy their storage needs? The advantages are significant, but have the costs in money and labor diminished sufficiently?

By Jack Fegreus

Technology innovations along with a new demand-driven consumer market radically altered the way large enterprises architect IT operations. These drivers are now emerging as equally powerful forces for change in the small to medium-sized business (SMB) arena. Key factors for change are the following:

  • Rising use of the Web as a 24×7 transactional environment;
  • Increasing reliance on customer relationship management (CRM) software to direct sales efforts;
  • Growing awareness by corporate officers of the value of company data and risks posed by backup and security vulnerabilities; and
  • Utilization of 1U rack-mounted servers and server blades for single-task functions.

Facing the same pressures from the new business environment, IT departments in SMB enterprises are also frequently structuring their IT services around three or four servers dedicated to single strategic tasks (i.e., a Windows domain controller to provide file, print, and single login services for desktop and laptop PCs; a Linux Web server; and a Linux or Windows e-mail server).

Out of this new IT environment, the dominant cost driver has become data storage. While the capacity of disk drives has increased exponentially, this has not translated into savings for storage. The traditional method of employing dedicated SCSI-based storage devices on each host in concert with the need to provide enhanced data security has weighed against any savings. In fact, current storage implementation schemes have actually driven costs steadily upward.

Click here to enlarge image

The active-active controller configuration posed an interesting challenge for both SuSE Linux and Windows Server 2003. Both operating systems reported two volumes for each logical array presented to it by the nStor 4520F system, one array for each controller. While we were able to access the second image under Linux, Windows Server 2003 prevented access to the second image. From a management perspective, this approach simplified a number of issues left to the administrator to resolve; however, it also prevented any fail-over without rebooting the system.

The escalation in storage costs arises out of IT's near-religious use of RAID arrays to provide servers with added data security. While RAID arrays provide mathematical assurance in eliminating storage volumes as single points of failure, RAID schemes require multiple drives, incur added storage overhead because of the need to maintain parity bits, use extra drives to act as hot spares, and may require duplicate drives for mirror sets (e.g., RAID-10 and RAID-50 configurations).

Adding to those costs, most sites implement RAID functionality in hardware via expensive RAID controllers that typically incorporate onboard cache and battery backup. Worse yet, RAID performance is optimized by increasing the number of drives: When it comes to performance, drive capacity is irrelevant. This turns large-capacity drives into a source of over-capacity rather than a source of savings.

At large enterprise data centers where total disk capacity is measured in terabytes, the storage savings potential can be orders of magnitude greater than at SMB sites. As a result, the notion of virtualizing physical disk storage over a Fibre Channel storage area network (SAN) proved particularly appealing to large enterprises.

While the costs and complexity of those early SANs were staggering, the volume of storage in use at large enterprises made it financially attractive to be a SAN pioneer.

Today, a number of technology innovations and lower prices are opening up opportunities for SMB IT shops to take advantage of a new generation of SANs, just as large data centers did with first-generation SANs. The first technology key helping to open the doors to SMB SAN implementation is the introduction of low-cost disk drives with capacities of about 200GB. These drives make it easy to build a terabyte RAID array. While only a few SMB servers individually require a terabyte of storage, many SMB sites are finding that overall storage requirements are rapidly moving beyond the terabyte level.

Today's large-capacity drives provide an opportunity for cost savings; however, their utilization by no means is sufficient to ensure that the installation of a SAN will provide the desired cost benefits in an SMB environment.

The costs associated with the configuration and maintenance of a traditional SAN topology made it nearly impossible for small IT organizations to realize any operational savings.

With the introduction of technologies that simplify the deployment and management of a SAN throughout its entire life cycle, the equation in favor of SAN deployment is beginning to change at SMB sites.

We stress "beginning to change," because many of the costs associated with building a Fibre Channel SAN are still stratospheric when compared to other networking technologies such as Ethernet.

In the classic vein of "batteries not included," the ports on SAN switches and many SAN devices do not have a direct connection for a fiber-optic cable. The rationale for this seeming oversight is that the switch can then be used with numerous types of fiber-optic cables having different modes and wavelengths. These differences are the reason that SAN connections can range up to 120km. As a result, switch ports are dubbed "universal" and you'll need small-form-factor pluggable (SFP) transceivers, which can cost from $200 to $300, to connect fiber-optic cables to the switch.

For our assessment of SAN applicability in an SMB setting, we concentrated primarily on the ease with which we could configure an initial working SAN fabric and then extend the fabric. The target user was considered to be a seasoned systems administrator that had no previous experience with SAN design or management.

The target environment consisted of three servers running a mix of SuSE Linux Professional version 9 and Windows Server 2003. Our plan called for each of the servers to access multiple logical disk drive partitions from a shared central RAID storage system. The business constraint for our test SMB scenario was the need to provide guaranteed 24×7 access to all systems.

For those new to SANs, what's important to note is that the things that are shared are low-level physical devices, which in our case were RAID arrays. We accomplished this by providing each server with its own set of logical partitions from the arrays. This is very different from sharing a file system via NFS or CIFS.

On opening Brocade's WebTools, an administrator is presented with a fabric tree of icons for all of the switches in the network and a menu of fabric-wide management functions. Clicking on a switch brings up a real-time view of the switch with status lights. From the real-time view of the switch, an administrator can bring up a properties page to manage the switch, including license keys.
Click here to enlarge image

There is no native intelligence associated with the storage devices in a SAN that supports any of the functionality associated with file sharing. All of these details are handled by the operating system. On that front, currently neither Linux nor Windows has the means within their respective file systems to support any such endeavor. In fact, the simple act of recovering from any sort of path failure in the SAN fabric is more of a traumatic, rather than transparent, event for both operating systems.

We set up a test environment using three servers. Two servers were Intel Xeon-based systems with PCI-X expansion slots: an HP ProLiant ML 350 G3 and an Appro 2400Xi. In each of these servers we installed an Emulex LightPulse 9802 Fibre Channel host bus adapter (HBA), which supports PCI-X at up to 133MHz. Our third server was a Dell PowerEdge 2400 with an Intel Pentium-III CPU with 64-bit 66MHz PCI slots. For SAN connectivity, we installed an Emulex LightPulse 9002 HBA.

Shared-disk storage on our SAN was handled by an nStor 4520F Series disk array. The 4520F comes with two RAID controllers in an active-active configuration. Each controller is powered by a 600MHz Intel RISC chip and includes a 1GB cache, onboard battery backup card, and two SAN ports.

As a result, a SAN topology can be configured where up to four switches have direct access to the nStor array. This provides a system administrator with considerable flexibility to optimize I/O across servers. Just as important, this configuration eliminates the possibility of a disk controller's being a single point of failure. Nonetheless, it also adds a level of complexity to disk configuration on each server. Every logical disk unit presented by the nStor system will appear multiple times—once for every nStor SAN connection.

We initially set up our nStor system with four Hitachi UltraStar Fibre Channel disk drives and four Seagate Cheetah Fibre Channel drives formatted as two independent RAID-0 arrays yielding a total storage capacity of nearly 1TB. We then created three logical unit partitions (LUNs) on each of the arrays. Each server was then given access to one of the LUNs from each of the arrays.

To complement the inherent fault tolerance provided by the active-active dual-controller configuration of the nStor array, we chose to implement a basic dual-switch topology for our SAN fabric. With the small number of devices being networked into our SMB SAN environment, a single 8-port Brocade SilkWorm 3200 switch would have been sufficient to support the current structure. However, a single-switch topology would leave the switch as a single point of failure and violate our goal of providing 24×7 availability. Given those business requirements on our test SMB environment, a key evaluation criteria centered on how quickly we would be able to get our initial two-switch SAN fabric up and working.

Installation of both the Emulex HBAs and Brocade switches was relatively fast and simple; however, given our SMB test scenario, the installations were not without a few trying incidents.

Thanks to the inclusion of HBA drivers in our operating systems, the initial installation of HBAs on both Windows Server 2003 and Linux was painless. The alternative under Linux would have involved a compilation of the driver and a rebuilding of the kernel. That's a scenario that many small sites avoid like the plague. Nonetheless, the presence of an Emulex LightPulse module in the SuSE Linux distribution was not quite the panacea we hoped it would be.

Being very rigorous about their software, Emulex registers its driver module as controlling a Fibre Channel interface and not a disk controller. As a result, SuSE Linux did not automatically include the Emulex driver module in initrd. The fallout from this exclusion will be obvious to seasoned Linux administrators: Assuming that SAN-based drives have been mounted, a reboot of the system will fail. That's because the Emulex driver will not be loaded at boot time when the system runs an fsck to check each disk listed in /etc/fstab. While the solution is simple—add the Emulex module in /etc/sysconfig/kernel and run mk_initrd, which can be done painlessly using YaST2—the question remains: Does this add too much complexity for an SMB site that may just be moving to Linux?

We used the performance monitoring feature of Brocade's WebTools to demonstrate the effects of ISL trunking. On our Dell server we mounted a LUN served by a primary controller connected to a different switch. When we ran oblFileLoad using 64KB I/O reads, throughput peaked at 188MBps (eight-drive array). With trunking enabled, data throughput across ports 6 and 7, which formed the ISL trunk to the switch connecting the nStor array, was almost perfectly balanced.
Click here to enlarge image

Similarly, the out-of-box experience for a system administrator with our Brocade switch begins with a specialized serial cable, telnet, and a command-line interface. When it's time to set up the Brocade switch, system administrators will not find the kind of simplified network discovery mechanism that is fast becoming a check-off requirement in network-attached storage (NAS) appliances.

Once all of the SilkWorm switches have been assigned an appropriate network address, they can then be graphically managed and monitored using Brocade's WebTools software, a Java-based Web application that resides on individual switches. These switches can be accessed out-of-band via an Ethernet connection or in-band via IP over Fibre Channel via a master switch that has an Ethernet connection.

Brocade's documentation for WebTools generically states that a Java-enabled browser is needed to run the application; however, the official system requirements call for the use of Internet Explorer on a flavor of Windows or Netscape Navigator on Solaris. Once again, it's the old issue of Java—compile once, debug many. So we were not shocked when some of the Java applets failed to run correctly with Mozilla on SuSE Linux 9.0. Fortunately, we were able to run WebTools on SuSE using Konqueror, although the Java classes loaded slowly.

WebTools offers an intuitive interface to administer the entire fabric and configure characteristics of individual switches such as IP address, switch name, and Simple Network Management Protocol (SNMP) settings. Using WebTools, an administrator can identify the devices connected to a fabric, update the firmware on a switch, and manage the license keys for optional functionality.

The classic SAN topology is a mesh in which every switch is linked to every other switch. Such a scheme minimizes the number of hops between switches that any frame of data will have to make when traveling between an initiator and a target device.

nStor's StorView provides an intuitive, easy-to-use interface. Critical status information on all system components is readily available, and accessing detailed management and configuration functions is a trivial task.
Click here to enlarge image

The task of configuring ISLs in a SAN reduces to the problem of robbing Peter to pay Paul. The systems administrator must decide how many ports on a switch to reserve for data connections and how many to reserve for links to other switches. This decision defines the "subscription ratio," which compares the total number of ports used for connecting devices to the total number of ports used for creating ISLs. On an 8-port switch, reserving six ports for devices and two ports for ISLs yields a 3-to-1 subscription ratio, which is considered to be the rule of thumb for good performance.

Unfortunately, when it comes to SANs, rules of thumb are often not as easy as they seem. Simply connecting a group of ports between two switches does not automatically make those circuits act as a single high-bandwidth logical connection. There is nothing to stop most of the traffic from moving down one of the port-to-port connections and creating a bottleneck. To prevent that from happening, Brocade switches can implement special trunking firmware on the switch to provide load balancing over the group of adjacent ports designated to form the single logical trunk. The option to add load-balanced trunking, however, does not come cheaply. In the case of the Brocade switch purchased through Hewlett-Packard, the trunking option adds $3,800 to the price of the switch, which brought the total switch cost up to $10,000. (Editor's note: This review was conducted in late March and reflects pricing at that time.)

Once our fabric was functioning, we turned to the configuration of our storage array. The setup process for an nStor system can start independently of the SAN via traditional method: Connect a serial cable and run VT100 terminal emulation. Far more interesting, however, is nStor's host-based client/server software, StorView, which runs on both Linux and Windows platforms.

The StorView server module runs as a background process. One of its tasks is to use multi-casting over in-band (Fibre Channel) and out-of-band (LAN) connections to automatically discover all of the installed nStor storage systems. To communicate with the StorView GUI, the server component uses Apache 2.0, which we installed on a Windows server. Apache will not conflict with an existing IIS installation unless IIS is listening on port 9292. As a result, administrators can run StorView's GUI on any system that can access the system running the nStor server.

LUN mapping is a critical task for SAN administrators. To ease the process, StorView presents administrators with the unique WWN of every HBA discovered in the SAN. To identify the physical server and HBA to which the WWN corresponds, administrators can use Brocade's WebTools. The fabric name server lists the WWN of every initiator (such as an Emulex LightPulse HBA) and every target device (such as an nStor Wahoo controller) connected to any switch port for the entire fabric.
Click here to enlarge image

Any RAID management function required by a high-end system can be found in StorView. All of the high-end RAID levels, including RAID 10 and RAID 50 (which are not always found in host-based controllers) are supported. Storage objects such as arrays, LUNs, and HBAs can be assigned friendly names. More importantly, for sites experiencing rapid data growth, any existing LUN can be expanded using any available free space.

The use of Fibre Channel loops to connect drives enables the nStor storage system to be configured with up to 64 arrays of 16 drives each. What's more, there is no limit to a drive's capacity. However, there is a hard limit to the number of LUNs that can be created: 512. Furthermore, 32-bit addressing limits each LUN to a maximum storage capacity of 2,198GB.

SANs add a significant new wrinkle to LUN management. With no intervention, every LUN will be seen by every system multiple times—once for every physical connection between the storage system and the fabric. With two dual-port nStor Wahoo controllers, each of the LUNs created on the nStor 4520 system could potentially appear four times. In our SMB test scenario, we used only two switches in our testing so an optimal topology called for connecting just one port on each of the controllers to a different switch. Configured in this manner, a port, controller, or switch could fail in our fabric and we could recover.

This fail-safe feature of a SAN, however, poses significant configuration problems for an administrator when the operating systems in use are not very SAN-savvy—a category that includes both Windows Server 2003 and Linux. Locally, both Windows Server 2003 and Linux will see all of the images of any LUN that it can access. Windows Server 2003, however, treats these images in a different manner from Linux. Once one of the images is formatted and mounted, the operating system generates a system error on any attempt to access one of the other images. There is no such mechanism in Linux, and a system administrator is free, in theory, to mount any and all of the images. In practice, this is a bad idea.

The problem lies in the way file system metadata is handled for disk volumes. Case in point: Mount both copies of the LUN presented from the IBM-drive array, /dev/sda1 and /dev/sdc1, on a Linux system, and neither mount point will reflect the true status of the LUN's contents. Both mount points will only show the disk as a function of events that have taken place through their respective mount points. The only way to get a mount point in synch with the actual contents resident on the LUN is to dismount it and run a manual fsck with the rebuild-tree option invoked. Naturally, any reboot of the system will fail until an administrator intervenes and issues an fsck command to rebuild the inode structures for both volumes.

The problem of inconsistent metadata has profound implications for any attempt to share LUNs between systems. While Windows Server 2003 will prevent multiple mounts of the same volume, there is no such mechanism that prevents mounting the same volume simultaneously on different servers. When this happens, the results can be curious.

Since deleting a file simply means taking it out of the master file table (MFT), a file deleted from the volume on one server will not automatically be deleted from any other server. All of the other servers will continue to point to the proper disk blocks and easily find the file. At this point, what will become of the physical state of the LUN is left to a game of cyber roulette as we wait for a disk check to be issued from one of the servers. When that happens, the logical view of the server on which the command was issued will be reconciled with the physical volume. That state of affairs is all but guaranteed to play havoc with the view of other servers.

I/O command size data tells a lot about the differences in the way Linux and Windows perform as I/O requests. After running oblDisk on servers running on Linux (where I/O commands were almost exclusively bundled into 128KB requests for both reads and writes) we repeated the same I/O pattern on Windows Server. Windows Server did not bundle I/Os. As a result, the nStor disk system did more work clustering I/Os to optimize performance.
Click here to enlarge image

Clearly, with neither Windows nor Linux able to deal with all of the complexities of LUN sharing, it is critical that two servers never mount the same LUN. The crudest way to do this is through fabric zoning. Through this scheme, switch ports can be grouped and communications between ports can be restricted to just group members. That plan works nicely as long as the connections to ports don't get reconfigured. A sophisticated scheme would restrict a LUN to a specific HBA based on the HBA's unique node WWN. That is precisely the scheme invoked by nStor's LUN mapping.

To this end, the StorView GUI lists the unique WWN of every HBA that it discovers on the SAN. Unfortunately, it is highly unlikely that any administrator will be able to identify precisely which HBA any of the WWNs correspond to. Fortunately, that problem is easily solved with Brocade's WebTools, which includes a simple name server.

While nStor provides a powerful tool to prevent LUNs from being corrupted through improper sharing among multiple systems, there remains the fundamental issue of fail-over. The problem for fail-over is that it is logically an instance of local LUN sharing. As a result, fail-over is more traumatic than it is transparent.

Windows Server 2003 handles the problem in a draconian manner. The volume in question simply disappears from view as all communications are terminated. Only a reboot will restore a new instance of the failed volume with communications through a different path.

Linux, on the other hand, allows multiple mount points. This capability appears to be part of the equation that makes fail-over semi-transparent. The lion's share of applications will keep on running after a disruptive fail-over. For example, we unplugged the port through which we had mounted the drive. We then attempted to run a number of applications. All worked with two notable exceptions: our oblDisk benchmark, which is explicit in calling a device, and SuSE's YaST2 disk partition utility, which consistently hung. That was not a good sign.

On rebooting the system, the rationale for the draconian Windows Server scheme became evident. Just as if we had explicitly mounted the drive twice, a manual intervention was necessary to rebuild the drive's directory tree to successfully reboot SuSE Linux.

From these facts and figures, it should be obvious that the nStor system is in no way limited by the constraints imposed by a 32-bit WinTel architecture. In particular, to improve disk-striping efficiency, the nStor system has a minimum starting chunk size of 64KB, which is the maximum I/O size for Windows. From there the chunk size moves up to 128KB, which is the preferred I/O size of Linux, and finally tops off at 256KB.

Naturally, this level of sophistication extends to the storage system's performance monitoring, which is crucial for providing information that can be used in tuning disk arrays and server operating systems for optimum I/O throughput. Performance data for LUNs includes read-and-write throughput rates; I/O read-and-write command sizes; read-and-write byte alignment; the frequency that the nStor array generates read-ahead requests; and the frequency that the array clusters write commands.

The goal is to avoid splitting write chunks across disks. The ideal situation is to issue write commands such that a chunk of data is placed as a whole on each disk and that the number of chunks equals the number of disks comprising a full stripe. This is especially important for write performance with RAID-5 and RAID-50 arrays.

We will be taking a closer look at SAN performance in upcoming issues of InfoStor as we delve deeper into both the performance and cost issues of building a SAN in an SMB environment.

Jack Fegreus is technology director at Strategic Communications (www.stratcomm.com). He can be reached at JFegreus@StratComm.info.

Lab scenario

Under examination
2Gbps SAN infrastructure

What we tested

  • Two Brocade SilkWorm 3200 switches
    • Eight 2Gbs ports (SFP)
    • ISL port trunking
    • WebTools for fabric monitoring
    • Port-based SAN zoning
  • nStor 4520 Storage System
    • Two WahooXP RAID controllers
      • 1GB cache
      • Dual Fibre Channel ports (SFP)
    • Active-active configuration
    • RAID level 10 and 50 arrays
    • Expandable capacity LUNs
  • nStor StorView Management Software
    • Host-based HTML software
    • LAN and in-band Fibre Channel IP connections.
    • Automatic discovery of server HBAs
    • Comprehensive performance monitoring
  • Four Hitachi GST UltraStar disk drives
    • FC-AL, 2Gbps
    • 10,000 rpm
    • 147GB
  • Four Seagate Cheetah disk drives
    • FC-AL, 2Gbps
    • 10,000 rpm
    • 73GB
  • Two Emulex LightPulse 9802 HBAs
    • Full-duplex 2Gbps Fibre Channel
    • Automatic topology configuration
    • Automatic speed negotiation
    • 133/100/66MHz PCI-X and PCI compatibility
  • One Emulex LightPulse 9002L HBA
    • Full duplex 2Gbps Fibre Channel
    • 66/33MHz PCI compatibility

How we tested

  • SuSE Linux 9.0 Professional
    • Linux Kernel 2.4.21
  • Windows Server 2003
    • .NET Framework 1.1
  • HP ProLiant ML350 G3 server
    • Dual 2.4GHz Intel Xeon CPUs
    • 1GB PC2100 DDR memory
    • Four 100MHz PCI-X expansion slots
  • Appro 1224Xi 1U server
    • Dual 2.4GHz Intel Xeon CPUs
    • 1GB PC2100 DDR memory
    • 133MHz PCI-X expansion slot
  • Dell PowerEdge 2400 server
    • 800MHz Intel PIII CPU
    • .512MB ECC registered SDRAM memory
    • Four 66MHz PCI expansion slots
  • Benchmarks
    • oblLoad v2.0
    • oblDisk v2.0
    • oblFileLoad v.1.0

Key findings

  • All necessary drivers included in both SuSE Linux and Windows Server 2003.
  • Throughput on four-drive array equivalent to local Ultra360 SCSI performance.
  • Better scaling in large arrays with Fibre Channel.
  • SAN fail-over is not a transparent process on either Windows Server 2003 or SuSE Linux.

This article was originally published on May 01, 2004