Access Control Methods for SAN-based Data
Here are five ways to achieve heterogeneous host attachment across storage area networks, with advantages and disadvantages.
By Paul Danahy
Although still an evolving technology, storage area networks (SANs) promise to revolutionize the way systems and storage interact. Today, host systems are most often directly attached to storage. With storage area networking, storage and hosts are not so closely coupled, allowing systems architects to engineer configurations that fulfill requirements previously met only by proprietary storage solutions. This article explores how RAID arrays can be employed in SANs to support multiple hosts and discusses some of the underlying issues driving SAN development.
SANs offer system designers a new way of approaching storage, especially with the advent of the Fibre Channel interconnect. The SCSI protocol was not developed for heterogeneous storage. In addition, SCSI is relatively slow and can not address a large amount of disk capacity. In fact, the only commonly employed configurations in which SCSI storage is attached to multiple hosts typically involves CPU clustering.
The Fibre Channel storage/network interconnect is different. With the ability to deliver data at very high throughput rates (100MBps) and to connect a large amount of capacity on a single loop (let alone a switched environment), Fibre Channel enables one of the most interesting storage-related developments in the last few years--heterogeneous host attachment to storage across a SAN.
Heterogeneous storage attachment is important for several reasons. It allows users to amortize investments in storage across multiple hosts. This is an important consideration, since the expense of storage often outweighs system installation costs. This is particularly true in cases where multiple small-capacity hosts require highly available storage (e.g., RAID).
Heterogeneous storage attachment also enables users to consolidate storage. Attaching a variety of hosts to a single large array reduces management and maintenance costs, positively impacting the total cost of storage ownership.
And, lastly, it allows for standardization. Instead of forcing system administrators to work with storage arrays from multiple vendors, this approach can provide a single centralized management system capable of managing storage throughout the SAN.
Impediments to Multi-Host SANs
Before we investigate how vendors are approaching the challenge of building multi-host SANs, it is important to know some of the traditional impediments to heterogeneous storage attachment. The primary difficulty in building SANs to support heterogeneous hosts is the SCSI protocol running on top of Fibre Channel. Unlike TCP/IP, SCSI was not designed to support multiple hosts on the same network simultaneously. Hosts that support TCP/IP fully expect to encounter other hosts on a local area network.
On the contrary, SCSI was designed as a server-to-storage interface--a significant problem in heterogeneous environments.
Since the SCSI protocol is not broad enough to encompass heterogeneous server attachment, problems can easily arise when a host performs any kind of unexpected operation. Examples like a SCSI RESET or the re-booting of a host can cause a SCSI-based heterogeneous environment to fail without warning. And, in most cases, non-clustered homogeneous servers suffer in the same manner as heterogeneous hosts. Employing a number of homogeneous hosts on a SAN, without some form of software or hardware access control to shared storage, can result in less than satisfactory data availability.
As a consequence, vendors have accommodated the shortcomings of the SCSI protocol with a variety of approaches. Each gives users a mechanism to share capacity on a SAN. They are listed below.
Shared Storage, Shared Data
The distinction between shared data and shared storage is an important one. In a shared-data environment, multiple hosts can access the same data. This is typically accomplished through the use of a common file system across homogeneous or heterogeneous hosts. Examples of this architecture in the TCP/IP networking world include NFS and CIFS. The advantage of shared data environments is that multiple users can access and edit the same information, whether it is digitized video, databases, or even executables (applications).
Shared storage, on the other hand, enables multiple hosts to access data within a single array, but typically data is not available to other hosts in the network. If a system consists of homogeneous hosts (for example, clustered servers) in a shared-storage SAN, data can be shared; in heterogeneous configurations, the non-uniformity of file systems precludes this function. This type of solution is excellent for environments in which multiple small servers are used to meet distinct requirements (for example, a mail server, a file and print server, and an OLTP server). Instead of investing in three arrays to meet the servers` capacity requirements, a shared-storage solution amortizes the investment in RAID technology across all hosts.
Operating Systems and File Systems
Two of the easiest approaches to implement are the use of a host`s operating system to manage file access or to create a distributed file system. Both configurations can be used to build a shared-data environment. The first has only limited applicability (because it requires homogenous hosts), and a very limited number of operating systems are robust enough to manage some of the inconsistencies of the SCSI protocol. This type of solution is generally not recommended for robust, high-availability configurations.
Creating a distributed file system is a more flexible solution and allows homogeneous and heterogeneous hosts to be configured. In this case, each host translates data into a common file format, much like NFS or CIFS. However, instead of relying on the corporate network for data traffic, this solution uses a high-bandwidth "behind-the-scenes" network.
On the upside, these configurations allows users not only to share storage capacity, but also to share data among multiple hosts. As such, they are suitable for video editing environments or for high-bandwidth video-on-demand servers. On the downside, all I/Os must be converted and re-converted in software, exacting a toll on overall performance, and these relatively new file systems are typically weak in terms of security and lock management.
I/O Filtering Access Control
A third solution is the use of device driver filters to either enable or prevent host access to storage on a LUN-by-LUN (Logical Unit Number) basis. With filtered access control, each I/O request from the host is examined to ensure that the information being requested or delivered is allowed for that particular host. This filtering ensures that hosts only interface with that for which they have been configured.
This type of access control can effectively operate in both homogeneous and heterogeneous configurations. Like the distributed file system approach, this type of configuration requires a high level of security to prevent unwanted accesses. In addition, access control at this layer requires vigilance on the part of the software provider to ensure that future operating system releases do not impact the solution. I/O filtering does not provide for shared-data access, except in a homogeneous host configuration.
Fibre Channel Switches, Zoning
A fourth approach to SAN access control is the use of "zoned" Fibre Channel switches. With zoning, certain ports in a fabric are mapped to communicate with each other, but all other ports in the fabric are excluded. Essentially, traffic within a zone does not get passed to any other ports that are not part of that zone. Multiple zones can exist within the same fabric, creating the possibility for a number of subnetworks within a single SAN.
The value of creating a zoned traffic area is its ability to prevent unrecognizable traffic from reaching the hosts--one of the key difficulties in multiple-host Fibre Channel implementations. Once a configuration has been created, individual hosts only see the traffic that they expect to see. In addition, the use of switches creates a series of high-bandwidth (100MBps) connections.
This segregation of the storage environment is a powerful tool and can be used to remove one of the significant impediments to heterogeneous storage attachment, but on its own it does not go far enough. Without some form of array-level access control, all the storage in an array would be visible to every host on the SAN. This is a similar situation to the one described earlier--the SCSI protocol (whether running natively or on Fibre Channel) has limitations that affect the LUN ownership model in SANs.
Array-Level Access Control
The use of specially developed access-control software on the array itself controls host access to storage within the array on a LUN-by-LUN basis.
Essentially table-driven access control, this solution is able to map host access requests to storage associated with that host exclusively, or inclusively, depending on the configuration and application requirements. In essence, the host only sees what the access control software on the array allows it to see. For example, it is possible to share a 1TB RAID array across 10 hosts, with each host believing the array only has 100GB of capacity. This solution provides complete access control with full security. By implementing management of software control in the array`s storage management utilities, the solution becomes fully host independent.
On the downside, because this is a generic SAN, the interaction of each of the hosts in the network is visible to all other hosts. This can have unpredictable results when unexpected signals from one host are transmitted on the SAN and are seen by one or more unsuspecting hosts. However, zoned Fibre Channel switches can be used to prevent one host from seeing another host`s I/O traffic. With this combination of technologies, users can build the most resilient form of a shared SAN.
One of the most important requirements when considering the combination of multiple hosts and one or more RAID arrays in a SAN is whether the goal is shared access to a common data set or shared access to one or more common arrays. Once that goal is determined, there are a number of options that can be employed to meet the needs of the hosts and the applications.
A shared-storage configuration enables multiple hosts to access data within a single array.
With a zoned Fibre Channel switch, certain ports in a fabric are mapped to communicate with each other, but all other ports in the fabric are excluded.
Paul Danahy is the marketing manager at Data General`s CLARiiON business unit, in Southboro, MA.