Replication software can extend fail-over protection in Microsoft Cluster Server environments.
BY DAVID DEMLOW
Continuous availability of mission-critical applications is vital to sustaining a competitive edge in today's global marketplace. Microsoft Cluster Server (MSCS) provides crucial application fail-over capabilities to Windows 2000 and Windows NT platforms. Software that creates geographically distributed clusters with replicated data can extend MSCS capabilities over a distance for added protection against hardware failures and site disasters. Such software can provide efficient replication while maintaining maximum application performance.
MSCS was introduced in Windows NT Server, Enterprise Edition 4.0. It supports the connection of two servers in a cluster and provides an architecture for developing and deploying highly available applications. A future version of MSCS will support larger clusters. Two major requirements for a clustered application are location-independence and shared data access.
Location-independence lets users access an application or service at any location within a cluster. MSCS achieves location-independence by creating virtual servers with unique server names and Internet Protocol (IP) addresses that can be moved from one node to the other as required, without changing the physical identity of each node. Any node can create and simultaneously host multiple virtual servers, and a single virtual server can be moved without affecting other servers in the cluster. MSCS organizes cluster resources such as disks, network names, IP addresses, and applications into cluster groups, which can be brought online on any node in the cluster.
Another cluster requirement is that application data and metadata be available to each node in the cluster. For application metadata, MSCS provides an application programming interface (API) for cluster-aware applications to write to a shared portion of the registry that is made available to all nodes of the cluster. This API eliminates the need for manual synchronization of registry entries between nodes.
For application data, MSCS provides all cluster nodes with access to data by giving each node in the cluster the ability to access a shared disk via a common SCSI or Fibre Channel bus. Although a shared disk is actually available to all nodes, MSCS is considered a shared-nothing cluster architecture since the cluster software prevents data corruption by allowing only one node at a time to access a disk. A true shared-disk cluster would allow multiple nodes to access the same disk simultaneously using a distributed-lock manager or other mechanism to control access.
Figure 1: A standard MSCS cluster has physical disk resources that control the ownership of each shared disk within the cluster.
A standard MSCS cluster, as illustrated in Figure 1, has physical disk resources that control the ownership of each shared disk within the cluster. For example, if the cluster group SQL that contains the physical disk resources S: and L: is brought online on node 1, drive S: and drive L: will become accessible to node 1 but will be inaccessible from node 2 at that time.
The architecture of a standard MSCS shared-disk cluster protects against the failure of an individual server by allowing the other cluster node to take control of the disk and restart the application. But it leaves the cluster vulnerable to other failures that could leave the shared data unavailable. Specifically, because there is only one copy of the data in this architecture, both the storage of that data and the connection to the storage are potential points of failure.
In addition, the distance between the servers and storage is constrained by the physical media limitations (ranging from 75ft for SCSI to 10km for Fibre Channel) and the performance limitations resulting from the round-trip latency introduced by accessing storage at the speed of light over an increasing distance. Even with massive amounts of bandwidth, reading from and writing to a disk 10km away add a measurable and significant delay to each I/O operation, which may be unacceptable for many I/O-intensive applications.
Figure 2: Administrators can protect against site-level failures by locating the other node of the cluster with its own data at an alternative facility.
Third-party software enhances MSCS by allowing each node to maintain its own storage with an independent, synchronized copy of the cluster data. Because multiple copies of the data exist, administrators can protect against even catastrophic site-level failures by locating the other node of the cluster with its own data at an alternative facility, as shown in Figure 2. The nodes are connected and data is replicated over an IP connection that can be a LAN for short distances or a Fiber Distributed Data Interface (FDDI) or virtual LAN (VLAN) connection for greater distances. Typically, a private IP connection is created to isolate replication traffic from production networks.
Just as MSCS provides a physical disk resource that controls ownership and access to specified disks, distributed geographic data-redundant cluster software can provide a replicated disk resource that performs the same functions. If a group containing the replicated disk S: is brought online on node 1, all changes made to drive S: on node 1 are automatically and continuously replicated to drive S: on node 2. In addition, drive S: on node 2 is protected against accidental modifications since it is an offline replica. If that replicated disk is moved to node 2, the process continues in the opposite direction.
Although an application not specifically designed for cluster operation can often run in a clustered environment with certain modifications, the real benefits of clustering are achieved when the application communicates its status to the cluster software and monitors the status of the cluster itself. MSCS provides a simple API that enables developers to add this level of cluster awareness to their applications.
Fail-over of non-cluster-aware applications typically requires the installation of an idle copy of the application on the second node to ensure the correct executable files, dynamic link libraries (DLLs), and registry settings are present. The second copy of the application must remain in an idle state until fail-over. Another approach is to reverse-engineer the application, determine what registry keys and DLLs are required to move the application to another location, and manually copy those keys and DLLs.
Both approaches require careful attention to detail and testing. A single configuration change, service pack, or patch applied on one node that is not matched on the other may prevent a successful fail-over. An approach that worked for one version of an application may not work at all after a simple patch.
Creating cluster-aware applications allows developers to consider and control the behavior of their applications in a cluster environment. Many cluster-aware applications allow multiple instances of the application to run on one cluster or even on a single cluster node. With Microsoft SQL Server 7 or later versions, for example, fail-over can occur at the individual database level. For instance, the sales and accounting databases might run on node 1 in normal operations, while inventory and a Web back-end run on node 2.
During end-of-month accounting operations, however, the sales database could be temporarily shifted to a different node to balance the workload more evenly. In addition, application developers can provide their own application-monitoring algorithms to ensure their applications are active, healthy, and responsive. These algorithms can also trigger fail-over or other corrective actions as necessary.
With a tight level of application integration and the use of one standard, high-availability platform for development and testing, the cluster-aware technology can be used for more than emergency fail-over situations. It can even help eliminate scheduled downtime-often a major cause of server and application downtime. Many applications provide a rolling upgrade process in which the idle node of a cluster is upgraded while offline and then becomes the active node while the original active node is being upgraded. Microsoft provides a process to perform a rolling upgrade of the nodes in a Windows NT 4.0 cluster to Windows 2000.
Cluster quorum resource
In any cluster, one resource is designated as the cluster quorum resource and is required for cluster operation. The quorum resource is used as an arbitrator if the nodes lose communication with each other or otherwise challenge ownership of a given resource, as shown in Figure 3. The quorum resource performs special roles within the cluster and is designed to prevent it from operating as a split-brain cluster (in which both nodes try to bring the same resources online at the same time, potentially causing data corruption and conflicts).
For example, if a node comes online and is unable to communicate with the other node in the cluster to determine whether it is "live," the node that is coming online will force the cluster to arbitrate by trying to access the quorum resource. In a standard MSCS cluster, one physical disk acts as the quorum resource. For arbitration, a SCSI bus reset is performed, followed by a delay of 10 seconds to determine whether the other node of the cluster performs another SCSI reserve command of the quorum disk.
Figure 3: The quorum resource is used as an arbitrator if the nodes lose communication with each other or otherwise challenge ownership of a given resource.
If it does, the challenging node knows the other node is "live." If the other node does not reserve the quorum drive, the challenging node reserves it, becomes the new cluster quorum resource, and forms the cluster. If the cluster disk becomes unavailable, the cluster cannot continue to operate and will shut down.
To provide the arbitration function in a distributed geographic data-redundant cluster, one replicated disk resource is specified as the cluster quorum and is configured to monitor multiple locations on the network, called arbitration shares. These arbitration shares are standard Universal Naming Convention (UNC) file shares that are accessible by both nodes in the cluster over the network. For a node to take ownership of the quorum resource, it must be able to access a majority of the configured arbitration shares. Therefore, multiple arbitration shares should be located throughout the network for redundancy.
MS Exchange and SQL support
Although many applications can be run on a cluster, two of the most common and critical are Microsoft Exchange Server and Microsoft SQL Server. Both are ideal candidates for "stretch" clustering using a combination of MSCS and distributed geographic data-redundant cluster software.
The Exchange Server 5.5 engine itself is not fully cluster-aware, but Microsoft has released a cluster-compatible version of Exchange 5.5 with services that configure within the cluster. It provides active/ passive fail-over of the Exchange services. Fully cluster-aware and capable of active/active fail-over, Exchange 2000 adds the ability of an Exchange server to host multiple message stores and allows individual message stores to be moved independently among cluster nodes.
SQL Server 7.0 and later versions are fully cluster-aware applications. An active/ active configuration of SQL Server can be supported for fail-over, and individual virtual servers can be moved independently of any other virtual servers to enable manual load balancing.
MSCS has brought greatly enhanced availability to applications based on Windows 2000 and Windows NT. It gives developers a standard platform on which to develop cluster-aware applications, but because it relies on shared storage subsystems and maintains a single copy of cluster data in a single geographic location, MSCS remains vulnerable to certain types of failures.
Distributed geographic data-redundant cluster software extends MSCS by providing the ability to create an MSCS cluster with replicated data volumes. With real-time data replication that enables local copies of clustered data to be stored on each node of the cluster, such software eliminates the potential for a single point of failure. Because data is synchronized using standard TCP/IP connectivity, nodes can be located almost anywhere. Companies that are considering adoption of Microsoft clustering technology or have already invested in it should also consider using distributed geographic data-redundant cluster software for an added layer of site-level disaster protection.
David Demlow is vice president of product management at NSI Software (www.nsisoftware.com) in Hoboken, NJ.