Three approaches to storage virtualization

Posted on November 01, 2000

RssImageAltText

There are three main methods to virtualize storage, each with advantages and disadvantages.

By Claudia Chandra

The introduction of Fibre Channel has paved the way to distributed storage systems that can be shared among heterogeneous hosts. Some corporations are adopting storage area network (SAN) technology and are pushing the envelope toward making storage a "utility." The storage utility model requires centralized management of storage systems, yet must allow data access from any distributed location without users having to know anything about the way the storage is laid out, what types of storage systems are being used, or how resources are being allocated. At the same time, users need to feel secure that their data is being protected from unwanted access.

Evolving storage into a utility imposes the following requirements on storage systems:

  • Support for transparent access from heterogeneous hosts to heterogeneous storage systems. Servers running heterogeneous operating systems should be able to share storage capacity on different vendors' storage systems.
  • Continuous data availability to support 24x7 operations.
  • High-performance data access.
  • Data security to allow access only to those users that have access rights.
  • Non-disruptive storage capacity expansion. Adding additional storage devices to the storage network should be transparent to users and should not require any server downtime.
  • Full data protection and recovery.
  • Transparent data migration. Migration of data caused by failure or storage reconfiguration should not change the way users access the data.
  • Online storage re-provisioning. Requests from users for additional storage quotas or access rights modifications should not interrupt data access.

Storage virtualization supports the storage utility model, providing secure and dynamic pooling of diverse storage equipment across heterogeneous servers and clients. Virtualization can provide the following functionality:

  • Translation from one storage protocol to another, (e.g., SCSI to Fibre Channel or SSA to Fibre Channel) to support heterogeneous storage and server environments.
  • SAN storage configurations to support high availability and high-performance access, such as designating primary, secondary mirror, and spare drives, and creating composite drives to concatenate multiple storage subsystems into a single drive for ease of management and flexible capacity expansion.
  • Visualization and monitoring of the SAN, with the ability to notify administrators when a critical event occurs for timely correction and recovery.
  • Replication through n-way mirrors, snapshots, and remote (asynchronous) copy over a TCP/IP network.
  • Automated failover in the event a storage device (e.g., through mirrors or hot spares) or some other device on the path between the host and the storage subsystems (e.g., routers, host adapters, or switches) fails.
  • Automated backup and recovery.
  • Data caching.
  • Zoning to control host access to various storage devices.

Three approaches

Architecturally, there are three main approaches to storage virtualization.

1. HOST-BASED APPROACH

This approach relies on an agent or management software installed on one or more host systems to implement the control and administrative functionality of storage virtualization. This approach potentially may be less scalable and provide lower performance than other methods because the control functions run on the host and require host-processing cycles.

This approach may also be more prone to errant hosts causing inadvertent access to protected data. To guard against this, appropriate control software must be installed on each host. This approach may also be less flexible because the software controlling storage virtualization may not interoperate with various storage software and hardware.


The host-based approach relies on agent or management software that is implemented in one or more hosts. STORAGE-BASED implementations rely on the storage subsystem to provide the necessary functionality. The network-based approach integrates the functionality in network components.
Click here to enlarge image

On the other hand, the host-based approach is the easiest and least costly method to implement because it doesn't require any additional hardware. Vendors of storage management software tend to follow this approach because they already have mature software products, which have easy-to-use graphical interfaces for monitoring and virtualizing the SAN.

With good load balancing among hosts, the host-based approach can be a cost-effective approach to storage virtualization, particularly in relatively small SAN configurations.

2. STORAGE-BASED APPROACH

This approach to virtualization relies on the storage subsystem to provide the functionality. It may be necessary to supplement it with third-party SAN virtualization software. In addition, it may not work well in SANs with multi-vendor storage systems and could lead to single-vendor lock-in.

Advantages of this approach include:

  • Because it is implemented within the storage system, it provides optimal performance because it is tuned to that specific storage system.
  • It is easier to manage because everything is done transparently within the storage system.

3. NETWORK-BASED APPROACH

This approach implements the storage virtualization functionality within equipment on the network (e.g., appliances, switches, or routers).

The appliance-based approach can be symmetric, where both the control and data go through the same path, or asymmetric, where there are separate paths for control and data. In symmetric implementations, the appliance can become a bottleneck. However, using multiple appliances to manage the SAN, and load balancing between them, relieves the bottleneck. Using multiple appliances to manage the SAN also allows for server fail-over when one of the appliances is down. However, this creates multiple SAN islands because each appliance server controls only the storage systems to which it is connected. The asymmetric appliance-based approach is more scalable than the symmetric approach because the control and data paths are separate.

Appliances can be run on a dedicated server using standard operating systems, such as Windows or Unix, or on proprietary appliances with a vendor-proprietary operating system. When the appliance software runs on a standard operating system, the approach has many of the advantages of the host-based approach (e.g., ease of adoption and low cost). Performance may be better than the host-based approach if it's an asymmetric implementation.

A number of appliance vendors also offer additional functionality (e.g., data caching) to improve performance. Pro-prietary appliances can offer better performance and functionality, but hardware costs are higher.

The appliance-based approach, however, may also have some of the disadvantages of the host-based approach, because it still requires either agent software or a host-bus adapter on each host to route storage requests to the appliance server. Any failure on the host or im proper host configuration can cause unprotected data access. Also, interoperability with heterogeneous operating systems may be an issue.

With switches, storage virtualization functionality is either embedded in the switch firmware or it runs on a separate server attached to the switch. The server can run proprietary or standard operating systems. Switch-based vendors provide the management functionality in software.

With switches, the agent does not have to run on each host. Therefore, they do not have the security issue of an appliance or the host-based approach, providing greater interoperability in a heterogeneous environment. However, switches can still be a bottleneck and point of failure. But for additional cost, redundant switches can be added for fail-over in the data path.

As for routers, storage virtualization functionality is implemented in the firmware. Vendors usually provide management capability through additional software that runs on the host. A router sits in the data path between each host and the storage network. The router intercepts commands from the hosts to the storage systems on the network. Because, potentially, one router exists to serve each host, this approach scales well.

Since most of the control function exists in router firmware, routers can potentially provide better performance than host-based approaches and some appliance configurations. Because it doesn't rely on an agent running on each host, there is a high level of security. A failing router connecting a host to the storage network can still cause data to be inaccessible from that host. However, only the host connected to the failing router is affected; the storage system is still accessible from other hosts through the other functioning routers. Router redundancy with support for dynamic multi-path fail-over is a remedy to this problem.

Routers provide high interoperability in heterogeneous operating systems and multi-vendor storage environments because they often also serve as bridges for protocol-to-protocol conversion.

Pros and cons

Each of the above-described approaches has advantages and disadvantages. The host-based and storage-based approaches may be the most appealing to early adopters because no additional hardware is required, but they may not work well with heterogeneous storage subsystems and operating systems.

For organizations requiring maximum interoperability, approaches using switches or routers may be more appropriate. For high scalability, routers may provide advantages. Appliance implementations lie somewhere in between. They are not as secure as switch and router approaches, but they scale well and take the load off individual hosts by performing storage virtualization functions.

Different implementations of storage virtualization also vary in their support for functions such as replication, backup and recovery, and access control. Storage management software vendors tend to provide the most complete management suites; however, multi-platform support and feature capabilities may not be as strong.

In the replication arena, a variety of mirroring capabilities has been developed. Many vendors offer triple-mirror capabi lity. Some virtualization vendors even provide four-way mirroring.

Mirroring also comes in a number of flavors. A full mirror creates a complete copy of another drive. An incremental copy, sometimes called a snapshot, only stores the changes from a previous copy. Sometimes, the copy drive may exist in a different location, which is connected over an IP connection, in which case a remote or asynchronous copy is required.

Zoning

Storage access control usually comes in the form of separating hosts and storage systems within the storage network into different zones. Only hosts belonging to the same zone as the storage device can access that storage device. Both hosts and storage devices can usually be members of multiple zones.

Zoning can be implemented in a number of ways. The various approaches vary in the granularity of storage sharing allowed and ease of management.

Port-based zoning usually implements the functionality in a switch, where access is controlled by restricting connections between prescribed ports. The downside of this approach is that zones cannot overlap, and when the port designation of a zone member changes, the zone needs to be reconfigured. Zoning by world-wide name or network address does not have the latter disadvantage of port-based zoning because the identity of the zone member doesn't change even if its switch port is modified.

Subsystem zoning, or volume mapping, is usually implemented by internal controllers in disk subsystems. It allows entire or partial drives to be exported, and presents a single storage subsystem to appear as multiple drives to multiple hosts.

Zoning based on LUN masking is typically implemented in a host I/O controller, host software, or router. LUN masking acts as a filter to allow host access only to specific storage resources. LUN masking also allows individual drives within a storage subsystem to belong to different zones, creating device-level zoning. Zoning based on LUN masking can be difficult to administer because the firewall control exists in each host I/O controller, software, or router, and must be coordinated across the network to maintain the integrity of the data.

In the storage virtualization market, many technology integration partnerships are likely to occur among vendors. Data center managers need to have a good knowledge of the technology in order to clearly understand their choices.

In evaluating storage virtualization solutions, ask the following questions:

  1. Does the architecture support heterogeneous operating systems and storage subsystems?
  2. Can it scale to meet your future storage requirements?
  3. Is it easy to manage?
  4. Does it provide fine-grained zone management?
  5. Does it support flexible capacity expansion?
  6. Is it reliable?

Claudia Chandra heads software product marketing at Vicom Systems Inc. (www.vicom.com) in Fremont, CA.


Comment and Contribute
(Maximum characters: 1200). You have
characters left.

InfoStor Article Categories:

SAN - Storage Area Network   Disk Arrays
NAS - Network Attached Storage   Storage Blogs
Storage Management   Archived Issues
Backup and Recovery   Data Storage Archives