Storage virtualization: An overview

Virtualization, or abstraction, can be implemented in many ways, and in many different locations.

By Greg Schulz

Storage virtualization, or abstraction, can be defined in general terms and in product-specific terms. From a general standpoint, virtualization (or abstraction) of storage can be described as hiding or masking a physical storage device from a host server or application. This broad description can be refined to describe how physical storage devices (disk drives, RAID controllers, LUNs, etc.) are accessed in a transparent manner. Volume managers, RAID controllers, and operating systems can be used to mask the physical devices.

The second description is product-specific, based on a model using external software. This can be accomplished in two ways. In one approach, a volume manager running on a host system provides virtualization functions in conjunction with some level of storage or data management. Another approach moves some intelligence and data management from the storage device and, possibly, the host volume manager to a middle layer on an external system. This middle layer of specialized software is sometimes referred to as a virtual storage server, domain controller, or storage appliance.

Without storage virtualization, applications access storage using physical names or addresses, a process known as "hard addressing." As systems become larger and more complex, hard addressing makes portability and management more difficult. On a Windows-type operating system, an application may refer to a file on the "C:" drive. Other operating systems, like Unix, make references to devices in the form of /dev/dsk/c0t0d0s6. These devices are thought of as physical addresses.

Figure 1: Without storage virtualization, applications access storage using physical names or addresses.
Click here to enlarge image

Host operating systems add a layer of virtualization between the I/O subsystem physical address by masking the physical address. In Figure 1, Physical Volume A (PVOLA) may have an underlying SCSI address of, for example, while an application may see it as the "C:" drive or /dev/dsk/c0t0d0s6. For small systems, this may be sufficient; however, as systems grow, the effort to manage data storage increases. Thus, additional virtualization of storage and enhanced data management functionality is needed.

Some of the benefits of storage virtualization include the ability to:

  • Isolate applications from underlying physical devices.
  • mprove availability and maintainability of systems.
  • Expand storage on the fly, including volumes and files.
  • Reduce downtime for backups and other maintenance functions.
  • Migrate data from systems and applications.
  • Support large, high-performance storage applications.
  • Mix and match various storage devices for investment protection.
  • Support advanced storage applications such as replication.
  • Support failover and other functions.
  • Support larger storage devices than are physically possible.

Volume managers

These features and others are available today using volume managers built into, or added to, operating systems. A volume manager is a specialized application or software utility. Some operating systems include integrated file systems and volume managers for enhanced data integrity, performance, and functionality.

Database systems provide a form of storage virtualization similar to file systems and volume managers. Some database systems virtualize storage by presenting a unified virtual address space to administrators. Database programs and applications make reference to an address space rather than to a disk location. This translation from a virtual address space is handled by the database system. Some databases support "raw," or direct, I/O to a storage device; however, many systems use an underlying file system to provide an additional level of virtualization.

Volume managers provide a layer of transparency between applications, file systems, and physical disks. Most Unix operating systems have volume managers built-in, or layered using third-party products like Veritas' Volume Manager. Windows NT has a rudimentary volume manager in the form of NT Disk Administrator, which can create RAID (mirror), stripe groups, and volume groups. With Windows 2000, a more robust volume manager developed by Veritas will provide functions similar to those found on Unix systems.

Storage area network (SAN), network-attached storage (NAS), and other RAID storage devices provide virtualization with LUNs, virtual volumes, and network file systems. Some products limit virtualization to fixed-size mirrors or individual physical volumes partitioned into fixed-size virtual volumes. Most disk arrays support LUNs of variable size, with hardware RAID for redundancy. Other virtualization features are provided by RAID arrays to improve performance, and present LUNs larger than a single physical disk device.

Figure 2: Volume managers take physical disk drives and create logical volume groups.
Click here to enlarge image

A combination of volume managers and advanced storage devices provides storage virtualization and other features. This combination provides features such as open replication or remote mirroring, point-in-time backups or snapshots, dynamic volume or file system expansion, load balancing, physical disk migration, and clustering. For performance, multiple LUNs can be presented to a volume manager over different SCSI or Fibre Channel interfaces, as well as for redundancy.

Volume managers take physical disk drives (also known as virtual volumes, or LUNs) and create logical volume groups. These logical volume groups can span multiple LUNs and host bus adapter interfaces for performance and capacity improvements. Logical volume groups can include different physical disk drive sizes, RAID devices, and host bus interfaces. Logical volume groups can be configured with different performance and availability attributes to meet specific application requirements. For example, in Figure 2, Physical Volume A (PVOLA) might be an 18GB SCSI disk drive, PVOLB a 36GB Ultra SCSI disk drive, PVOLC a 100GB RAID 0+1 (hardware RAID) LUN, and PVOLD an 18GB Fibre Channel disk drive.

A volume manager is used to create multiple logical volume groups containing PVOLA, PVOLB, PVOLC, and PVOLD. One logical volume group contains two 18GB disk drives mirrored together for high availability, and performs snapshot copy (point-in-time backups). The storage in this logical volume group is allocated to the host as a single 18GB logical volume (LVOLA) with a file system on it. A second logical volume group contains the 36GB disk drive or LUN, and is partitioned into two logical volumes (LVOLB being 4GB and LVOLC being 32GB). To expand the storage capacity on LVOLB and LVOLC, and to support additional logical volumes, a 100GB RAID 0+1 LUN (PVOLD) is added to the second logical volume group.

With the increase in storage in the second logical volume group, existing file systems can be expanded, new logical volumes created, and data migrated off the 36GB disk drive transparently to host applications. The 36GB disk drive can then be used to create a new logical volume group along with other unused disk devices. Another possibility would be to add the 36GB drive to the first logical volume group as a data availability volume or snapshot device.

Figure 3: The host system and volume manager see four LUNs presented by an external RAID array.
Click here to enlarge image

Figure 3 shows an example of a host system and a volume manager that sees four LUNs presented by an external RAID array. By providing virtualization of the individual physical disks, the volume manager and operating system do not need to be configured to support as many devices. Further virtualization of storage occurs with the RAID device providing automatic disk drive rebuild and replacement with hot-spare disk drives. On storage devices lacking hardware RAID and support for large volumes, specialized software or volume managers are needed for virtualization.

Distributed volume management

The next step for enhanced volume management is to support distributed volume management between different host systems with the same operating system, as shown in Figure 4. Distributed volume management, particularly in SAN environments, enables host systems to coordinate volume aspect and storage management functions. In this model, shared storage is accessed by different host systems, and access is coordinated between their volume managers and file systems. The next step is to support mixed operating system types accessing the same common distributed volume manager and file system.

Most NAS solutions provide integrated file systems and volume management, some with fully journaled file systems for redundancy. These NAS solutions provide yet another form of storage virtualization at a higher file system level. With NAS-type access, storage physically sits behind a host system or specialized server running file-sharing software. A user accessing a file or storage via NAS is not aware of where it is physically located, what type of server the file sharing software is running on, or what type of network is being used. Most high-end and departmental NAS solutions provide storage virtualization capabilities to expand volumes and file systems on the fly, perform on-line snapshots for backup, and other features.

Hosts accessing NAS devices using NFS, CIFS, or proprietary protocols over standard networking can do this on Fibre Channel interfaces. In this model, storage access and management is virtualized behind a file server, or storage appliance, that accesses the storage devices on behalf of other systems. Vendors such as Veritas have taken a first step toward enhanced virtualization capabilities with volume managers and storage replication products.

Shared file systems

The next step in storage virtualization will be shared file systems and volume managers that span systems, allowing for different operating systems to directly access data. Today, NAS and Fibre Channel SANs support storage and data sharing. New products, called storage domain servers, SAN appliances, or virtual storage servers are emerging as another option. This new type of technology utilizes servers with specialized storage management software and disk devices to provide virtualization. In its basic form, a storage appliance has specialized software performing volume management, RAID, mirroring, and other functions.

This model enables existing storage to be migrated into a SAN or RAID environment by placing the devices behind a storage server that is attached to host systems via Fibre Channel, SCSI, or other interfaces. These solutions provide storage virtualization where volume managers do not exist, for storage devices that do not support virtualization or advanced data management features. This approach also helps end users to avoid vendor lock-in.

Figure 4: Distributed volume managers communicate across multiple host systems to coordinate access and support replication, data movement, and other functions.
Click here to enlarge image

In Figure 4, distributed volume managers communicate across multiple host systems to coordinate access and support replication, data movement, and other functions. As distributed file systems mature, operating systems and volume managers will evolve to support enhanced functions.

Figure 5: Storage virtualization or abstraction can be put in a "virtual storage server."
Click here to enlarge image

A new approach, shown in Figure 5, is to put storage virtualization or abstraction in a "virtual storage server." Most storage vendors today provide the ability to mask physical disk drives from host systems. Some approaches combine hardware RAID, including mirroring, RAID 0+1, and RAID 5, with variable volume sizes, SAN interfaces, NAS support, and volume mapping to insure data integrity.

Figure 6 shows a similar approach, with distributed volume managers and distributed file systems working with intelligent storage devices supporting volume manager primitives. In the future, distributed volume managers and storage devices may be integrated without the need for bridge technology. This approach eliminates the need for an extra layer of specialized software or appliances.

Figure 6: Distributed volume managers and file systems can work with intelligent storage devices supporting volume manager primitives.
Click here to enlarge image

Volume managers should be used to provide virtualization on host systems and simplify storage management. Windows 2000 will soon have a new volume manager for increased functionality. While features differ from platform to platform, most Unix systems have volume managers to provide virtualization and abstraction of storage.

Fibre Channel SANs combined with local volume managers and integrated SAN/NAS solutions provide a new level of shared storage. New storage appliance-type products running on Windows NT and other systems can complement these approaches and provide additional virtualization to storage devices. Volume managers are being enhanced to further utilize RAID virtualization and support other features.

Volume managers combined with advanced storage devices provide solutions to virtualization and abstraction. To reach the holy grail of shared data and storage access from multiple systems over Fibre Channel and other SAN technologies, advanced volume managers, file systems, and storage management interfaces will be needed. Some of these interfaces and standards are being defined by organizations such as the Storage Networking Industry Association (SNIA) to help hosts off-load additional storage management to external storage devices.

Greg Schulz is a senior technologist at MTI Technology Corp. (www.mti.com), in Anaheim, CA.

This article was originally published on August 01, 2000