Storage area networks and server clusters are complementary architectures that provide combined benefits.
By Erik Ottem
Clusters are collections of servers aggregated together to achieve high availability, scalability, and/or high performance. Much of the clustering market is focused on high availability: If one server fails, the application(s) are still available on an alternate server. High scalability in a cluster usually means load balancing, so the workload can be shared efficiently among servers. High performance in a cluster usually refers to a high throughput rate.
Storage area networks (SANs) provide the storage infrastructure to enhance clustering technologies. A SAN is easier to manage and set up than traditional SCSI architectures. SANs based on Fibre Channel technology allow hot plugging, longer distances, more devices, and better performance than most SCSI implementations.
Clusters of servers allow users to consolidate servers to save labor, hardware, software, and space. Clusters also provide the platform for a comprehensive, mainframe-like management structure. The natural companion to server consolidation into clusters is storage consolidation. And one way to implement storage consolidation is through SANs.
Building a cluster
A cluster consists of a group of servers linked together on a LAN, often some variety of Ethernet (see Figure 1). Ethernet acts as the conduit for a heartbeat among servers so that if one fails, the failover mechanism will kick in. This failover mechanism varies by operating system and/or application.
Figure 1: A cluster consists of a group of servers linked together on a LAN, often some variety of Ethernet.
In this configuration, the storage is linked through each server. If one server fails and the application must migrate according to the failover scenario, the storage must likewise be available to the new application. The traditional connection to the server is through a SCSI cable, which is limited in four ways:
- Connectivity: 15 nodes
- Performance: depends on the SCSI version, but commonly 80MBps
- Distance: depends on the SCSI version, but typically 10 meters
- Hot plugging: not commonly supported
In many cases, a better way of clustering is to create a SAN that supports the server failover. A SAN is based on Fibre Channel technology, which provides greater connectivity (126 devices per loop), performance (100MBps), distance (10km standard and 80km or more with extenders), and hot pluggability. These features provide a more flexible storage configuration architecture. In terms of clustering, SANs provide the tools to create a high-availability network for storage that mirrors the LAN capability to failover server applications. A SAN creates a network parallel to the LAN for storage failover to accommodate server failover (see Figure 2).
Figure 2: A SAN creates a network parallel to the LAN for storage failover to accommodate server failover.
In this configuration, a server failure doesn't affect access to storage. A SAN can be built with redundant components to assure connection even in the event of a failure of any component.
Building a SAN
The configuration of servers and storage must reflect application requirements. In most clusters, the primary requirement is high availability, which requires redundancy. In this case, the servers will have a failover scenario dictated by the operating system or application. The storage will be available through the SAN infrastructure by placing redundant Fibre Channel switches and redundant connections in the servers and storage. In this way, no single failure will eliminate access from the application to the storage.
Building a SAN usually starts with small configurations called SAN islands, often just to consolidate backup procedures, which are notoriously inefficient. A group of servers, perhaps a departmental cluster, are often connected with an inexpensive loop switch in a redundant configuration.
There are two varieties of Fibre Channel switches: loop and fabric. The key differences between the two are addressing, services provided, and cost. A loop switch makes switching decisions on the first 8 bits of the Fibre Channel address. A fabric switch makes switching decisions on the full 24-bit Fibre Channel address. Just as in the LAN world, high-function switching generally occurs at the core of the SAN, while low-cost switching occurs at the periphery (edge). The analogy in a SAN is that a fabric switch, like a router, is best used at the core of the network as a backbone. The low-cost edge switch is often a loop switch. For cost effectiveness and functionality, LANs often use a two-tier approach, and the same is possible with SANs: fabric and loop switches can be used together for maximum effect.
Once the loop switch has been deployed for the SAN island, or departmental cluster, the failover scenarios must be put into place. Cluster software combined with the zoning capabilities of the switch will allow a failover scenario to be configured for the specific environment.
Clusters in different departments may then be linked to data centers with fabrics to take advantage of fabric addressing capability. In this case, the extra expense of the fabric is justifiable due to the requirements for more extensible addressing. Loop switches in the departments can typically support up to 32 nodes. Fabric switches then link departments into an enterprise storage architecture.
As Fibre Channel continues to evolve, this architecture will be a sound basis for expansion. For example, as 2Gbps Fibre Channel speeds arrive, local departments built on existing 1Gbps devices, switches, routers, or hubs can be integrated into a higher-speed backbone switching platform. This protects initial investments and is not prone to performance blocking in the same way as mesh configurations.
Configuring a SAN
Sharing a storage array or library is done via configuration. Even in a diverse vendor or operating system environment, a SAN can provide for asset sharing. For example, by sharing a tape library a company does not need to have a tape drive on each server. Instead of collecting and delivering tapes to each server, the backup operation can be centralized. Even in environments where there are different operating systems and file systems, a SAN can be configured to allow sharing of the tape library through a technique known as zoning.
Zoning provides a selective connection capability to elements in the SAN, and allows the integration of different operating systems or file structures into one common storage infrastructure. This provides the mechanism to backup data with one infrastructure, instead of one for each environment. For instance, NT backup would use certain tape media and tape drives for one period, Solaris might use the tape library with different media during a different period. It also allows different file types to be mirrored using the same switch with different areas in two storage arrays.
The most common benefit of a cluster is high availability. Clusters can provide much higher availability at a lower cost than was previously available because the cluster is aware of the failure of one server node and will follow a failover scenario to ensure continued operation. This is achieved in two basic ways: either as a function of the operating system or a standalone application. In some circumstances, a combination of the two will be required.
Figure 3: Pairs of failover clusters can be zoned into a switch sharing a common storage resource.
Windows NT/2000 clusters have a load balancing capability that accommodates up to 32 nodes. One way a SAN can help reduce costs and improve usability for Windows failover clusters is through the use of zoning. Pairs of failover clusters can be zoned into a switch sharing a common storage resource (see Figure 3).
Windows NT/2000 clustering currently does not include mirroring inside a failover cluster. To achieve this, an additional layer of software can be used to mirror inside the cluster.
Unix clustering has typically been used in large, complex configurations. Unix clusters can use SANs to share their storage infrastructure and simplify a failover scenario by providing a high-availability storage infrastructure available to all servers. Similarly, NetWare also has sophisticated clustering failover capabilities that are enhanced with a high-availability SAN to provide storage to all servers in the cluster.
Scalability can be thought of in a couple of different ways. In one way, scalability is the ability to add new devices to a cluster without significant impact on the operation of the cluster. This is a limitation of traditional SCSI that is overcome with Fibre Channel's hot-plugging capabilities. There is an additional aspect of scalability: load balancing. Microsoft differentiates its load balancing capability from failover capability. Server load balancing can be handled by the operating system or by a standalone application or device.
In the same way that a SAN provides a common storage infrastructure for high-availability clusters, it provides the same benefit for load-balancing clusters. In most cases, this would be used in conjunction with the SAN infrastructure tools to do performance monitoring of the SAN, so that storage can be reconfigured to optimize the load in the storage domain to complement the load on the server clusters.
For some environments, performance is the highest priority. Unlike load balancing or high availability clusters, the goal of this type of cluster is throughput. The throughput is typically delivered through multiple paths for the highest aggregate data rate.
The multiple pathways in the SAN provide multiple 100MBps links between servers and storage for high performance. Each server can see all the storage, and it is possible to add high availability by providing multiple paths to the storage. If a lower-cost, high performance configuration is required, only one path to the storage is required.
An efficient way to manage a SAN is through a networked storage pool (see Figure 4). The standard way of managing a SAN is through zoning, and zoning does provide considerable advantages over traditional SCSI configurations. However, to get the most out of the physical SAN infrastructure, storage virtualization may be required. Virtualization aggregates physical devices on a SAN into a storage pool that is managed from one platform.
Figure 4: Virtualization aggregates physical devices on a SAN into a storage pool that is managed from one platform.
Just as a SAN allows aggregation of storage into one physical infrastructure that can manage each unique view or zone, a storage pool provides one view to manage all zones. This allows improved management by reducing the amount of labor required to manage storage in an enterprise. Instead of managing ten storage devices, they can be managed as one in the storage pool. And it enables different operating systems and vendors' devices to be managed under a single platform, which helps combat the high cost and low availability of MIS labor in today's market.
Heterogeneous platforms can share a networked storage pool spanning different devices, even different vendors, and serve up different files depending on the application, while using a common management platform.
In many cases, SANs are the best way to support new clustering technologies. As an enterprise grows and moves into clustering, it creates SAN islands that are based on Fibre Channel loop switching for low cost and ease of use. These SAN islands are then tied together with a Fibre Channel fabric backbone. Storage management is performed through a networked storage pool for low cost and high manageability. The storage pool brings the power of the physical infrastructure to the enterprise by treating several physical devices as one logical storage device, which reduces costs by better using both resources.
The result of combining loop switching for clusters linked by fabric backbones and managed in a networked storage pool is a scalable, manageable infrastructure to support different kinds of clustering.
Erik Ottem is director of solutions marketing at Gadzoox Networks Inc. (www.gadzoox.com), in San Jose.