NT Clusters: New Demands on Storage
Clusters offer high availability, but they may make you rethink your approach to storage.
Long talked about under the code name Wolfpack, Microsoft`s Windows NT clustering is finally here and it promises to propel NT into the enterprise arena. Clustering will enable organizations that deploy NT for critical production systems to get closer to the level of reliability and availability offered by other enterprise operating systems, such as Unix, OpenVMS, and OS/390. In the process, however, system administrators will have to rethink their approach to storage.
Clustering ensures high availability by compensating for failed servers. It does so by joining two or more independent servers together. If a server fails, another server automatically and quickly picks up the workload of the failed server. However, unlike multiprocessing, where a single server uses multiple processors to enhance performance and scalability, clustering involves separate servers. A single multiprocessing server can fail just as readily as any other server, bringing applications to a complete halt.
An NT cluster is transparent, appearing as a single system to users. In fact, users see the clustered storage simply as storage. In the event of a failure, there may be a slight delay (about 30 seconds) before the system picks up exactly where it left off.
Microsoft announced its clustering solution--Microsoft Cluster Server (MSCS)--last September. It currently provides automatic failover recovery between two servers. The ability to cluster more than two servers requires a mechanism to ensure file integrity. Distributed lock management, or some other mechanism, could be used to control concurrent writing and accessing of data. Microsoft promises support for more than two nodes in its next clustering technology release.
The benefits of clusters are expected to attract a large following. The Gartner Group consulting firm, in Stamford, CT, expects the market for Windows NT to exceed Unix by the end of the year, driven in large part by the availability of NT clustering.
Typical applications for Windows NT clustering will be large database systems. Through clustering, users and applications will be able to access data even if a server fails.
With clustering, data will be accessible through multiple servers. Clustering will have major implications for Windows NT storage, which is growing at almost 22% annually, according to International Data Corp., a market research firm in Framingham, MA. For one, the arrival of Windows NT clusters will force system administrators to rethink their storage strategies because it changes the way organizations deploy, plan for, and back up storage. To begin, system managers will need to familiarize themselves with the concepts of clustering and high availability. Then, they will be able to begin making decisions about which applications and data need the kind of high availability that clustering can provide and which applications may require a different approach, such as fault tolerance.
High availability and fault tolerance are not the same. Fault-tolerant systems (e.g., an air-traffic control system) do not experience any downtime in the event of a failure. Typically, dual systems run in parallel, with one system ready to pick up instantly should another system fail--without any disruption to the application.
High-availability applications, on the other hand, can withstand a short stoppage--a few seconds or even a few minutes. Email is an often overlooked example of a high-availability application. If the system goes down, workflow and other business processes quickly stop. Other high-availability (24x7) applications include Web servers or systems that directly affect customers.
The key to high availability is eliminating single points of failure. Multiple servers, multiple communications paths, and RAID storage ensure that the loss of a server, link, or disk drive does not cause a system failure. A high availability system automatically senses a failure and switches the system over to another server.
Once Windows NT system managers have considered the implications of high availability in their systems and business processes, they need to look at systems and storage deployment. Clustering involves channel connections between servers and storage subsystems. Today, that connection is usually SCSI or Ultra SCSI, which currently limits the physical distance between the components of the cluster to 25 meters (50 meters with a SCSI hub). In the future, Fibre Channel will extend that distance to 10km.
Optimal performance in a clustered environment requires external RAID storage, which greatly improves a system`s reliability. The majority of NT storage today is in a JBOD (just a bunch of disks) configuration, which does not offer any redundant level of reliability. In addition, the internal storage that is common among NT servers today represents a single point of failure: If the server goes down, access to the stored data is blocked. However, if data is kept external to the server, it can be accessed by a second server in the cluster.
Windows NT administrators will also have to evaluate their storage infrastructure for deploying cluster storage. Specifically, they need make sure that the various components of the clustered system are equipped with the appropriate SCSI adapters, controllers, external storage cabinets, etc.
NT managers will also have to revise their storage capacity requirements. With multiple servers sharing the same storage, additional storage will probably be needed. Similarly, the use of RAID storage will increase the amount of storage capacity required from 25% (using a RAID parity reconstruction approach) to 100% (using a RAID-mirroring approach). As nodes are added to the cluster, storage capacity requirements will increase commensurately. And as organizations start to implement enterprise transaction systems and large data warehouses on Windows NT server clusters, administrators will have to consider the challenge of managing terabytes of storage.
Cost Is Key
NT clustering needn`t be expensive. Intel-based servers offer numerous low-cost entry points, and organizations can acquire external RAID subsystems with sufficient initial capacity for as little as $5,000. A typical small cluster storage system can be acquired for less than $10,000.
As more and more organizations turn to Windows NT, clustering`s role will increase. It will allow managers to deploy NT where high-availability requirements previously precluded its use. NT clustering will also enable organizations to replace aging mission-critical systems with systems that are in compliance with Year 2000 standards.
To use NT clustering effectively and to take advantage of the high availability it allows, system managers will have to reevaluate their storage strategy. The old desktop and work group approach of hanging gigabytes of JBOD off each individual server or simply embedding it in the server, won`t allow for the kind of high availability required by critical enterprise applications. System managers will have to adopt an enterprise approach to storage.
Dave Coombs is vice president of storage sales and marketing at Digital Equipment Corp., in Maynard, MA.