Cluster appliances are expected to emerge with a number of approaches, ranging from shared-nothing architectures to shared-everything configurations.
By Joan Wrabetz
Server appliances have generated a great deal of interest recently as a way to make complex technology easier to integrate and use. They support a variety of applications, including file services, print services, fax services, remote access services, thin-client services, and Web caching. Industry analysts predict strong growth for server appliance markets, viewing them as a necessity for overloaded IT staffs dealing with mounting workloads.
While effective today in providing basic services to client systems, server appliances will certainly continue to evolve as network devices that function as a part of larger, integrated network systems. However, they also will develop into higher-level robust servers that are capable of supporting mission-critical, high-availability applications. In general, mission-critical computing requires configurations comprising two or more appliances working together as a networked "cluster appliance." While employing the same concepts that have been used in fault-tolerant computing for many years, networked cluster appliances translate the implementation of these concepts from cost-prohibitive, proprietary technologies to affordable, open-systems products.
Figure 1: The back channel is a private network shared by the cluster appliances.
The key is to use the appliance as a granular network element-replacing traditional proprietary, high-availability system components such as storage and bus technologies. With appliances as the primary building block, communications between cluster elements can be any standards-based networking technology.
Some of the more difficult aspects of achieving high availability, such as seamlessly switching to secondary storage, can be dramatically simplified by replacing inflexible storage device and subsystem technologies with easily adaptable network-attached storage (NAS).
In the near future, IT professionals will be able to deploy cluster appliances that outperform today's standalone appliances in all critical areas, including I/O performance, reliability, and scalability. Enabling technologies include:
- Switched networking
- Virtual networks
- Flow control technology
- Network processors
- Gigabit and multi-gigabit transmissions
- Fiber-optic media
- Serial system bus technology
- Low-latency protocols
- Storage networking
- Data sharing
- Object-based storage
Integrating these technologies into tightly coupled cluster appliances is occuring in some high-end appliance products. As these technologies mature and costs are reduced, cluster appliances will likely be accepted for the significant benefits they deliver.
Examples of cluster server appliances
Cluster appliances are made from two or more "cluster nodes" that work together to provide a service to a group of clients. There are many architectural possibilities for cluster appliances that in general closely resemble the architectures of clustered systems-with the exception being that the integration of these products is implemented using an appliance plug-and-play model.
Figure 2: In a shared-nothing configuration, the primary and secondary appliance nodes have separate storage I/O subsystems and channels.
One common aspect of a cluster appliance is an optional high-speed "back-channel" network that ensures the various nodes in the cluster can communicate with each other. The back channel is a private network shared by the cluster appliances (see Figure 1).
The back-channel network-distinct from the client/server network-implies a design that incorporates plug-and-play connectivity to the legacy client/server network, with the possibility of incorporating new networking technologies as part of the back channel. This way, new technology can be implemented without forcing expensive upgrades to an existing client/server network. In other words, the cluster appliance could include both the cluster appliance nodes and networking equipment for the back channel network, as shown in Figure 1.
Several designs for cluster appliances, ranging from basic to more complex, are discussed below.
Shared-nothing cluster appliance
The most basic model for cluster server appliances takes a hot spare, shared-nothing approach. In this configuration, a primary appliance node provides all services until it fails, at which time a secondary appliance node assumes the workload.
A key design point of this elementary cluster appliance is the fact that the primary and secondary appliance nodes have separate storage I/O subsystems and channels. This means that data written by the primary node must also be mirrored onto the secondary node to enable it to assume the tasks of the primary node on failure. This is referred to as a "shared-nothing" configuration because the primary and secondary appliance nodes operate as independent sys-tems supplying the same function (see Figure 2).
Figure 3: In the shared-data approach, the primary and secondary appliances are connected to the same storage subsystem(s). A cluster appliance that has a secondary "hot spare" node can share data with the primary node.
There are several important details in Figure 2. First, the primary and secondary appliance nodes have different IP addresses prior to a primary failure. This allows both nodes to operate normally and be available for management purposes.
However, if the primary node fails, the secondary node assumes the IP address of the primary node. The secondary node has to be able to make this decision on its own.
For that reason, several mechanisms may be used to validate a primary node failure. One way is to use a back-channel network, which in its simplest form is a point-to-point link over an Ethernet crossover cable.
The connection used to monitor the "heartbeat" of the primary node also can be used to transfer mirrored data from the primary node to the secondary node. Alternatively, separate networks could be used for the various functions, including the client/server network, if needed. However, the client/server network is not likely to be reliable enough to support the low-latency, high-speed links required of both heartbeat status checks and data mirroring. If the mirroring link is fast enough, it is possible that a journaling or transaction system could be used to provide precision synchronization between primary and secondary appliance nodes. If the mirroring link is not fast enough, it may not be possible to provide immediate failover to the secondary node, and it is possible that some data will be lost. This is likely unacceptable for database or transaction systems.
The shared-nothing model can provide high availability of the appliance's service, but otherwise does not offer the benefits of clustering technology. Neither the processing power nor the storage capacity of the secondary node contributes to extend the overall capabilities of the appliance.
Hot-spare cluster appliance
An upgrade to the shared-nothing ap-proach is to connect both the primary and secondary appliances to the same storage subsystem(s). Using this model, all data that is written by the primary node is also immediately available to the secondary node, without the need to mirror data. This shared data approach requires a storage interconnect where both primary and secondary appliances have equal access to the storage of the cluster server appliance. This storage interconnect is an excellent application of a back-channel network that includes data sharing.
Figure 4: A network switch can provide load balancing across multiple appliance nodes. The back channel provides the connection between the switch and appliances.
One possible interconnect could be a Fibre Channel network, but the physical characteristics of the back channel are less important than its data-sharing capabilities. One approach uses a gateway, or file access manager-based SAN file system that manages data sharing. Alternatively, a NAS appliance on an Ethernet back- channel network also could be used.
In a shared-data configuration, a cluster appliance that has a secondary "hot-spare" node can share data with the primary node (see Figure 3). The data sharing in this example is provided by a separate NAS cluster node, connected to a switched Ethernet back-channel network.
An interesting aspect of this example is how a standalone server appliance-the NAS system-has been integrated into the larger cluster appliance. Compared to the shared-nothing approach, this model provides high availability with improved data integrity by reducing the likelihood of data corruption when the primary node fails during a mirroring operation.
However, because a single appliance provides client services, the processing power of this design does not scale well. In turn, storage capacity scalability is dependent on the type of data-sharing method used. In Figure 3-where the data sharing is done with a single NAS system-the capacity is limited by the maximum configuration of the NAS system. For today's NAS systems, this is essentially the maximum capacity of a single NAS box. In the future, there will be NAS designs that will easily add storage capacity without interrupting storage operations in the cluster.
Many Web sites today are designed with an approach that balances the processor load across several servers, each with its own copy of the data. This design depends on the presence of layer 7 load-balancing network switches that spread Internet HTTP protocol requests among all servers. In the case of Web servers today, each server is in effect an independent system that has its own copy of the data to service requests (see Figure 4).
One interesting aspect of this type of cluster appliance is the load-balancing network switch that feeds client requests to the various appliance members. It is possible that the cluster appliance might not include this switch, but its presence enables the application design to include other technology extensions.
In Figure 4, the switch provides the boundary between the cluster appliance and the client/server network, while the appliance back channel is the connection between the switch and the appliance nodes.
This type of cluster appliance works well for reading data, providing a viable solution for the intensive read nature of Internet access. Yet, it is not ideal when writing data: Updating data across multiple independent servers and all data copies in storage is extremely difficult. For that reason, the usefulness of this approach is far from optimal for general-purpose computing.
Figure 5: In a shared-everything approach, all client requests are serviced by any available appliance node, each of which has direct access to all data.
Furthermore, as the amount of data behind each appliance grows and/or the number of appliances grows, the overhead cost of keeping a complete copy of the data behind each appliance can get prohibitive. Compared to other cluster appliance designs, this approach yields good processor scalability, but does not provide similar I/O performance benefits or storage capacity scalability.
Cross-mounted data access
A variation on the load-sharing cluster appliance is to spread the ownership of data across all the appliances in the cluster. Then, whenever one of the appliances needs to access data on another node, it makes a second inter-cluster request to cross-mount the data. If the load-balancing network switch is included as a component of the cluster appliance, as shown in Figure 4, there are no physical design changes needed to implement cross-mounting. This configuration provides a back-channel network through the load-balancing switch that can be used for all cross-mounting operations between appliance nodes.
However, the change from the shared-nothing model to the cross-mounting model requires a data-sharing distributed file system running on each appliance member. The distributed file system ensures that all appliance nodes know precisely what data each individual appliance node manages. Each appliance node is now able to handle any user request by accessing it locally or by cross-mounting it.
While processor scaling using the load balancing, cross-mounted cluster appliance is high, the storage I/O performance is susceptible to bottlenecks caused by the contention for cross-mounted data. The storage capacity scaling of this approach also would be improved, although it would be limited by the aggregate capa city of all appliance nodes.
Shared-everything cluster appliance
The final example of a cluster appliance uses a "shared-everything" approach, where all client requests are serviced by any available appliance node, each of which has direct access to all data. The shared-everything approach can provide the optimal solution in terms of scala bility, performance, and reliability.
Load-balanced access to any appliance node in the cluster is still employed. However, in this architecture the storage behind all of the appliance nodes in the cluster is pooled and becomes one shared pool of data, accessible and shared by all appliances in the cluster (see Figure 5).
This architecture differs from cross-mounting in several ways. First, this approach provides the best cumulative I/O because each appliance processor node has unencumbered access to all data: There are no gateways or cross-mounting control impediments in the I/O path.
Second, files are striped across the pool of storage rather than being located entirely on a single appliance, resulting in better performance. Storage I/O performance is further improved because large files do not cause a bottleneck at one appliance when accessed through any node, because those files are striped across a number of nodes.
Finally, files are striped in such a way that all appliances become fault tolerant, because data is re-constructed from other appliances if a single appliance fails.
As is the case with the cross-mounted appliance, the shared-everything cluster appliance requires a data-sharing distributed file system. In addition, this architecture creates a higher requirement for a separate back-channel network because that network is now used for distributed file access and striping files across appliance nodes, and for reconstructing data during failures within the cluster.
Separate processing/storage appliances
An interesting modification of the shared-everything cluster appliance separates the processing from the storage and creates dedicated processing and storage appliances, both of which are part of a cluster (see Figure 6). While this approach provides the ideal solution in terms of optimized application and storage performance, it also requires the highest level of integration of new networking technologies. Figure 6 illustrates a shared-everything cluster appliance with a single cluster back-channel network connecting all appliance storage nodes in addition to appliance processing nodes.
In this configuration, the cluster appliance can be extended by separately adding additional appliance storage nodes or processing nodes on an as-needed basis. In Figure 6, there are four appliance processor nodes to handle client requests and five appliance storage nodes to handle the capacity and I/O throughput requirements. This architecture provides complete independence of processor and storage nodes, allowing any combination of each as required by the application.
The performance scaling of this design is excellent because performance increases with each additional appliance added to the cluster. Capacity scaling is also optimal, and is increased by adding appliance storage nodes as needed. While it is possible for appliance storage nodes to service multiple concurrent data requests from multiple processor nodes, these storage nodes would be optimized for storage I/O operations to allow the overall cluster appliance to function optimally. Simply put, this is a cluster appliance made up of specialized inter-cluster appliances.
Figure 6: In this configuration, a shared-everything cluster appliance has a single cluster back-channel network connecting all processing and storage appliance nodes.
While it would be possible to use dual back-channel networks-one for processor connections and the other for storage-the implication in Figure 6 is that a single physical back-channel network provides all inter-cluster connectivity and supports both client and storage traffic.
A number of potential alternatives will exist for this type of traffic, including native storage traffic on TCP/IP networks, optimized protocols such as the Virtual Interface (VI) architecture and the Direct Access File System (DAFS), and optimized networks such as InfiniBand.
A difficult issue facing IT organizations today is the lack of skilled talent to manage corporate intranet and Internet networks. Server appliances alleviate the pressure of network administration by providing plug-and-play services with a minimum of installation and configuration effort.
Built on the widespread success of many types of server appliances, the appliance industry will expand and take on other roles spanning both networking and computing. Some of these devices will be subservient, equivalent to simple devices running on an open-systems network as opposed to a system bus. Other devices, such as cluster appliances, will elevate server appliances and provide higher levels of performance, scalability, and reliability.
The Internet-driven introduction of new networking technologies is giving the appliance industry the tools it needs to develop these new cluster appliances. The idea of a back-channel network to carry inter-cluster communications makes it relatively easy to introduce cluster appliances into established legacy networks. As the demand for 24x7 availability continues to grow, the market will turn to specialized cluster appliances to provide significant advantages at less cost.
Several models of cluster server appliances are likely to evolve, following the path of existing clustered server technologies. The shared-nothing model is the simplest to design and most likely will be offered by numerous vendors, providing reasonable availability, a minimum of new technology, and lower cost. Other iterations of cluster appliances will offer varying degrees of sharing, with modular designs that incorporate load balancing and data sharing between appliances to optimize scalability, performance, and reliability.
In a shared-everything cluster appliance, CPU loads are balanced among cluster nodes, with all CPUs having access to all data. Network load-balancing switches are a requirement for the shared-everything model. Another crucial element is the use of a data-sharing distributed file system that facilitates direct access to stored data. While fully realizing shared-everything appliance clusters with dedicated processing and storage nodes requires the highest level of integration and the latest developments in low-latency network technology, it will be achievable in the near future.
Joan Wrabetz is president and chief operating officer at Tricord Systems Inc., (www.tricord.com) in Plymouth, MN.