Simple SANs can be configured with server clusters and Fibre Channel controllers, hubs or switches, and RAID arrays.
BY FRANK W. POOLE
With ever-increasing requirements for continuous streams of information and zero downtime, storage area networks (SANs) have become the focus of much attention. However, the cost of implementing a SAN can be prohibitive. The combination of RAID and fail-over server clustering can make the entry into SANs much more feasible.
Fibre Channel-Arbitrated Loop (FC-AL) provides outstanding data-transfer rates and supports many devices and long cable lengths. Host-based Fibre Channel RAID offers an affordable solution into the entry-level SAN field, providing interoperability and scalability.
With all the hype around SANs, one would think they're a panacea for all storage problems. But according to a survey by Enterprise Management Associates, approximately 46% of the 187 IT professionals surveyed currently had no plans to implement any kind of SAN.
The major reasons cited for not having implementation plans in place were high costs and difficulty confirming a need for a SAN-which is understandable, considering the average cost of implementing a SAN can run anywhere from $200,000 to more than $1 million.
In addition, interoperability and scalability are very real concerns when deploying a SAN. It is important to find compatible equipment that will not be obsolete in a year or so. If more storage should be needed, however, it is equally important that even obsolete equipment be scalable within existing infrastructures.
The Storage Networking Industry Association (SNIA) defines a SAN rather vaguely as "a network whose primary purpose is the transfer of data between computer systems and storage elements, and among storage elements." In other words, a SAN is like a LAN designed for storage. According to this definition, it could simply be a server on a LAN with a backup device.
That said, there are a number of ways to set up basic Fibre Channel storage solutions using existing interoperable technologies such as RAID arrays and high-availability clusters. Other software products such as Microsoft Cluster Server (MSCS) for Windows NT/2000 or Apptime's Watchdog for Linux would also be required.
Single vs. dual loop
RAID is relatively inexpensive and easy to set up. Simply install a single-port intelligent Fibre Channel RAID controller into a server, and then connect a hub that has a connection to a disk enclosure. Next, attach this setup to a LAN via the server and you have a very basic SAN "island." The hub gives the SAN island an easy way to scale into a larger SAN environment when resources permit. For example, adding more storage is as simple as adding another Fibre Channel disk enclosure to the existing hub. The intelligent RAID controller should be able to configure this new storage configuration on-the-fly. A dual-port Fibre Channel controller and another hub (or split hub) can carry this one step further by adding cable redundancy and automatic fail-over. The dual-loop configuration of the controller can also add increased bandwidth.
A single-loop topology is ideal for systems where the speed and cable length of Fibre Channel are important. A simple physical setup would include an intelligent, single-port Fibre Channel RAID controller located in a host server with Fibre Channel hard disks in a separate enclosure (see Figure 1a). This setup allows up to 125 hard disks to be connected to the server with one Fibre Channel controller. Using inexpensive copper cables, the enclosure can be placed up to 30 meters away from the server, making it possible to have the server and enclosure in separate rooms.
The drawback of a single-loop topology is that there is only one connection between the server and the hard disks. This connection is not redundant, so if the connection fails, the server can no longer access the data. To avoid this situation, a dual-loop configuration can be implemented using a dual-port Fibre Channel RAID controller and parallel cables between the 2-channel controller and the hard disks (see Figure 1b). If one loop has problems in this configuration, the system simply routes all I/Os to the other loop. This setup also improves the data transfer rate because if both loops are up and running, the bandwidth is doubled up to 400MBps, increasing overall performance. In addition, the extra redundancy of the dual-loop configuration makes it a good choice for high-security, high-performance systems.
Figure 2: Polling relies on each server not only being connected to the network and to a common storage device, but also to each other through some sort of interconnect device-usually a secondary network card.
Hard disks connected to the Fibre Channel controller can be configured to form one large RAID 5 array or several smaller arrays depending on specific needs (e.g., RAID 1 for operating system, RAID 5 for critical user data, and RAID 0 for high-performance, non-redundant configurations).
Another attractive application for Fibre Channel is server clustering. In a small cluster, two servers share one redundant storage system. If one server goes down, all resources and tasks are switched over to the remaining server. Since the storage system must be accessible to both servers, the storage system interface must support very high-performance and lengthy secure connections. The redundant dual-loop topology of Fibre Channel makes this possible. When two active loops are used, the available bandwidth is doubled for accessing the hard disks. When standard twisted-pair copper cables are used, however, the distance between the storage system and the servers may be as much as 30 meters-enough for each of the three systems (two servers and the storage system) to be in different rooms.
How clusters work
A fail-over server cluster basically works through polling. Each server in the cluster continually polls every other server in the cluster to ensure all servers are still operational. This polling relies on each server not only being connected to the network and to a common storage device, but also to each other through some sort of interconnect device, usually a secondary network card (see Figure 2). This interconnection, sometimes called a private network, is merely a "heartbeat" connection.
Figure 3: This configuration is based on dual-channel Fibre Channel RAID controllers with dual-loop disk arrays, redundant hubs, and cabling.
For a fail-over to take place, all three connections get involved. If a failed server is no longer available over the heartbeat connection, the other servers are aware of its absence and poll it again over the LAN. If there is still no reply, the polling server has to take over the failed server's assignment of connecting users to the common data. One downside to this type of redundancy is that the heartbeat connection requires another cable run. However, this is usually not a problem if the servers are located in the same room.
There are different ways of setting up a high-availability cluster server. The simplest way is to begin with a two-node cluster. Storage and servers can be added as needed or as budget permits. Single-channel Fibre Channel RAID controllers can be used in these simple configurations.
Using Fibre Channel technology, two dual-channel FC-AL controllers can be used with redundant Fibre Channel hubs in a dual-loop (redundant-cable) configuration. In this situation, the Fibre Channel hubs let you disconnect one node from the cluster and still maintain a certain level of redundancy. For maximum security, the same configuration could use redundant RAID enclosures (see Figure 3).
It is commonly thought that implementing such solutions in a SAN environment requires that external RAID controllers be used with the storage device itself and then routed back to the server through high-speed switches via a host bus adapter. While this is certainly a viable solution, it is not the only one.
Figure 4: SAN "islands" can be configured into a larger SAN environment through Fibre Channel hubs or switches if segmentation is needed.
By offloading RAID control from the storage device and mounting it to the server, throughput increases because the bottleneck of having all data passing through one single point in the SAN is eliminated. Using a dual-port RAID controller in such a scenario not only increases bandwidth but also eliminates a single point of failure. Of course, more servers can be added to the mix, but it is much simpler and more cost-effective to begin with a dual-port controller because it is like having two servers in one box. Dual ports also increase the bandwidth at the server and allow less-expensive hubs to be used instead of switches.
These so-called SAN "islands" can be configured into a larger SAN environment, as in Figure 4, through Fibre Channel hubs or switches if segmentation is needed.
While the cost of implementing a SAN has scared off many IT professionals, there are creative ways of using less-expensive, host-based FC-AL RAID technology to create safe and effective SAN islands. The combination of RAID scalability and fail-over clusters can be an effective choice for low-cost entry into more-complex SAN configurations.
Frank W. Poole is technical services manager at ICP vortex Corp. (www.icp-vortex.com) in Phoenix, AZ.