An in-depth look at virtualization approaches as well as in-band and out-of-band and other solutions.
By Frank Bunn
Storage virtualization is the latest buzzword in the storage segment-if not in the IT industry as a whole-even if the concept isn't new. Mainframes have been using memory virtualization technologies for almost 30 years to present more memory to applications than actually exists. And for years now in the Unix environment, storage virtualization has been implemented in many host systems in the shape of logical volume managers, although the IT trade press has never paid much attention.
The growing acceptance of storage area network (SAN) solutions in particular has contributed to the current popularity of storage virtualization. SANs have changed the storage landscape drastically through enhanced networking technology, but they've also produced their own set of challenges. Hundreds of heterogeneous servers and storage systems, multi-redundant Fibre Channel switch connections, and mutual interdependencies of applications must all frequently be "tamed." In this heterogeneous landscape, there's a strong need for global storage management, and storage virtualization seems to be the answer for makers and users.
WHAT ARE THE REQUIREMENTS?
Storage capacity requirements are continuing to grow at rates of 50% to 100% per annum at some companies despite the economic slump. However, the cost of managing storage environments is rising even faster than the cost of hardware. Many companies are already talking about a management nightmare as a roughly constant number of administrators cope with increasing numbers of servers of different types and storage systems that are incompatible with each other. For this reason alone, data administrators need better tools to manage the growing volume of data.
Independent studies have revealed that the average use of storage systems makes up 35% to 50% in disk systems and, in some cases, even less in tape systems. Since many storage units are firmly connected to specific servers, a free host's capacities cannot simply be made available to another host. Companies sometimes purchase new hardware to provide storage capacity even though free capacity is available elsewhere in the company, because that free capacity is in the wrong place and cannot be used. This can result in a variety of isolated storage systems that are economically unsound. Administrators need ways of allocating storage capacity across servers, and IT managers hope that this will cut acquisition and operating costs.
The need for constant availability with electronic business is another problem. The planned and unplanned downtime that results when storage is expanded or configurations are modified has become less and less acceptable.
IT departments are urgently searching for solutions to these problems-solutions that meet requirements for simplified storage management through better availability, greater flexibility, and reduction of total cost of ownership.
VIRTUALIZATION IS THE SOLUTION
The concept of virtualization is simple. The figure on the left examines the management and provisioning of disk storage systems-also called block devices-on the basis of two terms: storage consumer and storage administrator.
Storage consumers are individual applications, groups of applications, individual users, or an entire organizational unit with certain storage system quality requirements. Storage consumers require sufficient capacity, performance, and availability for applications. The physical aspects of storage such as disk size and the number of parallel disk units are irrelevant. Storage consumers don't want to deal with technical details; they just want to define the storage services they need.
Storage administrators have the task of meeting the requirements of storage consumers. To do this efficiently in a dynamic, heterogeneous landscape, they need powerful tools, flexibility, and scalability. Storage administrators want to use the same tools to help them manage as many different servers and storage systems as possible: Their goal is simplified management. The toolbox that can help them meet this goal is storage virtualization and the data services based on it, which helps storage administrators guarantee the quality of storage services that storage consumers require.
A CLOSER LOOK AT VIRTUALIZATION TECHNOLOGY
Instead of providing consumers with several physical storage devices, storage administrators give them one or more logical volumes. These logical volumes are presented to the host as normal SCSI disks. The host does not recognize that logical or virtual units are involved. A logical volume points to physical storage areas such as disk drives or LUNs in disk arrays with intelligent controllers. Virtualization (to be more exact, block virtualization) turns physical disks into virtual storage units that are as big as, as fast as, and as reliable as the consumer needs them to be. In addition, the capacity of these storage devices can be adjusted later without downtime.
When storage consumers need more disk capacity, additional volumes are created or logical volumes that have already been allocated to these consumers are enlarged. Mapping to other free physical disk units (block aggregation) takes place invisibly in the background.
When storage consumers need better performance, the layout structure of the logical volumes allocated to them is changed. Data blocks are split up and written in parallel to several physical disks (striping).
When storage consumers need enhanced resilience, the same logical volume can be written multiple times to physically indepen dent disk units (mirroring).
These configuration changes are generally performed online while applications and servers are operating. Storage consumers can continue to work uninterrupted using their logical volumes and the data stored there. Storage virtualization thus satisfies important high-availability requirements.
Storage systems are accessed on different levels: from the host with applications, databases, file systems, and logical volume managers, to SANs and storage controllers, to physical storage systems. The storage infrastructure is a stack of technologies that build on each other. Vendors have responded by positioning their virtualization solutions at different points in this infrastructure, so that the market operates with three approaches: host-, storage-, and network-based virtualization.
Host-based virtualization This type of virtualization is generally associated with logical volume managers. These have been available on mainframes and Unix servers for years and are now increasingly being offered for Windows platforms as well. As is the case with storage-based virtualization, logical volume managers are not necessarily associated with SANs. Yet they are still the most popular method of virtualization because of their history and the fact that direct-attached storage (DAS) is still very widespread.
A logical volume manager is either part of the operating system or can be implemented as a separate software product. Logical volume managers provide a virtualization layer in the host. Using this layer, physical disk storage and LUNs can be presented to applications as logical disk groups and logical volumes.
The great advantages of host-based virtualization are its stability after years of use in practice and its openness to heterogeneous storage systems. The proximity to the file system, which is also on the host, makes it possible to join these two components tightly for efficient capacity management (see figure below). Volumes and the file systems on them can be enlarged or reduced jointly without having to stop applications.
Logical volume managers are server-centric, i.e., their storage resources are configured for the specific host.
Storage-based virtualization Block aggregation takes place in storage systems with intelligent RAID controllers that often come with additional functions such as LUN masking, caching, point-in-time snapshots, and data-replication solutions. Storage-based virtualization (see below) delivers optimum performance in relation to the storage system in use. These solutions have been available for some time and do not require a SAN.
Since this type of virtualization is not tied to a specific host, an implementation can now support many heterogeneous hosts. Yet the solution itself is proprietary for every storage subsystem and cannot be deployed for other storage resources across boundaries.
Network-based virtualization The advantages of the two technologies described above can be combined in a storage network using virtualization. This approach supports data center-wide storage management as well as heterogeneous servers and heterogeneous storage systems.
Special devices called SAN appliances handle virtualization. They combine the physical disk systems on the SAN (block aggregation) or split them into smaller units and make them available to storage consumers on the host as volumes that have the required size, performance, and availability.
SAN appliances are available on the market either as proprietary solutions or as standard Windows, Unix, and Linux servers with corresponding virtualization software. Implementations with a single point of failure must be avoided, because all access operations on the SAN are channeled through this virtualization instance. It is therefore important in projects to make sure that the appliances are integrated redundantly and resiliently in the SAN.
SAN appliances can be integrated in the storage environment in two different ways:
- They are directly in the data path (in-band) between the servers and storage devices; or
- They have been implemented outside the data path (out-of-band) and pass on metadata on the physical structure and location of logical volumes to the host for example via LAN connections.
The terms symmetric and asymmetric are used as synonyms for in-band and out-of-band, respectively. The Storage Networking Industry Association (SNIA) is trying to standardize terminology to improve understanding and therefore recommends the exclusive use of the terms in-band and out-of-band virtualization.
In-band appliance An in-band appliance, as shown below, is located in the data path between hosts and storage systems. Control information (metadata) and user data pass through it. An in-band appliance, with its logical volumes, presents itself to consumers as a storage system on the host.
Since the entire data passes through an in-band appliance, there is a high degree of security when volumes are accessed. The appliance acts as a type of storage firewall: Attempts to access storage systems on a SAN that are not explicitly permitted by defined host volume allocations are rejected.
This virtualization concept needs no special drivers on the host and is therefore easier to implement than an out-of-band appliance. Interaction with a great number of heterogeneous host systems is supported.
An in-band appliance is another element in the data path through which data must pass. In-band appliances generally feature caching functions to minimize or eliminate performance problems.
Out-of-band appliance An out-of-band appliance is not implemented physically between the host and storage system. The system is outside the data path and communicates with the host systems via other connections (see figure on p. 27). Hosts therefore require a virtualization client or an agent in the shape of software or special host bus adapter drivers. This virtualization client receives information on the structure and properties of the logical volumes as well as the corresponding logical/physical block mapping information from the appliance. The appliance alone is responsible for storage configuration and control information. The host uses this information to address the physical blocks of the storage systems on the SAN.
It is a bit more complicated to implement out-of-band solutions than in-band solutions because the hosts must each have an agent, which may be the reason why most available SAN appliance solutions are based on in-band technology. However, less software is needed on the host than for a fully host-based logical volume manager, and this is useful for the continued spread of these solutions on the market and for porting virtualization clients to heterogeneous servers.
STORAGE APPLICATIONS AND DATA SERVICES
Storage virtualization, with its logical presentation of volumes and simplified presentation of complex physical storage structures, is not an end in itself. Rather, it is a toolbox that storage administrators can use to provide data services and storage management applications much more easily. These applications might include backup and restore, clustering, replication, point-in-time, copies/snapshots, migration, transformation, caching, security, and quality of storage services.
Storage virtualization makes clustering easier. When an application fails over to another server, the associated disk group with its volumes is logically transferred (imported) to another server. The cluster remains unaware of the physical structure of the volume, which may be very complex. This simplifies all import operations. Virtualization makes it possible to set up shared data clusters and shared file systems and lets multiple host systems simultaneously access shared volumes in a controlled manner.
Replication solutions transfer complete logical volumes without needing to know the complex physical structure. The destination volume can thus reside on a disk system of a different type and structure, and the replication software can operate with a relatively simple interface to the logical storage.
Virtualization enables storage administrators to define the specific quality of storage services-such as application performance, storage-on-demand, and shorter recovery windows after system failures-for certain applications and users. This makes it easier to set up storage system pools with similar attributes and to allocate them to certain storage consumers on an event-driven basis.
OTHER KINDS OF STORAGE VIRTUALIZATION
In addition to the block virtualization of disk systems described above, there are various solutions for file system, file, and tape storage virtualization.
File system and file virtualization File system virtualization, as used in network-attached storage (NAS) systems, combines multiple file systems to form one large virtual file system that is completely transparent for users. Users access their files in the normal way using the familiar NFS or CIFS protocol and remain unaware of the physical implementation or internal adaptations of this storage.
One example of file virtualization is a Hierarchical Storage Management (HSM) solution for automatic migration of rarely used data to inexpensive secondary storage media such as optical discs and tape drives. Users know nothing about this migration and assume that the data is still on the primary medium. Virtualization results in location transparency in this case. A pointer in the file system and other metadata ensure that the migrated file can be rapidly re-loaded onto the online storage medium and accessed there, whenever a consumer requests it.
Tape storage virtualization One of the goals of virtualization is better use of storage capacity. Many mainframe users suffer from the fact that, in many cases, only 15% to 30% of their tape capacities are used, owing to specific applications and access methods. Tape media virtualization by means of upstream disk cache systems and integrated tape emulation improve the level of utilization.
In addition, they accelerate access to tape units because they avoid the time-consuming loading and unloading of tapes. These operations are emulated via fast access to the disk cache.
Open systems users do not have these problems. Their biggest challenge does not lie in optimizing tape media, but in sharing tape drives in tape libraries among a great number of host systems. Previously, tape drives were firmly allocated to individual servers. SAN technologies via Fibre Channel allow tape drives to be shared by various servers. But storage virtualization makes it possible to set up tape drive pools with a guarantee of data integrity. The drives are dynamically allocated to certain servers for a certain period of time by means of virtual technologies. Other servers are denied access for this time. Defective tape drives are replaced by other physical drives from the pool, thanks to virtual intelligence, reducing application downtimes. All these functions run in the background, and storage consumers are unaware of them.
THE FUTURE OF STORAGE VIRTUALIZATION
Storage virtualization resolves the complexity of large heterogeneous SAN environments and is the ideal tool to help storage administrators meet the requirements of storage consumers.
Virtualization has already simplified storage management, and new development levels will improve it even more. Unified management of storage resources will be possible-beginning with their detection, visualization, and application-controlled allocation to certain hosts.
File systems and databases will request new storage capacity when they reach a certain utilization threshold and can be enlarged without operations being interrupted. Intelligent data movers transfer data on the SAN to the most suitable storage locations automatically and without user knowledge, thereby ensuring optimum performance, security, availability, and cost structure. Heterogeneous servers and applications throughout the company access the same logical volumes and files. A comprehensive file system with automatic adjustment of the different formats is now within reach.
Hardly any other topic currently receives the same intense attention as storage virtualization. The goals and advantages are undisputed, but the term virtualization itself has spread confusion because a precise definition has been difficult to find. Consequently, hardly a week goes by without a new press announcement of another virtualization solution. At the beginning of the virtualization boom, it was primarily smaller companies that attracted attention. But now nearly all major players in the storage market have caught up and are moving into this market, whether with their own solutions, OEM and reseller agreements, or company acquisitions.
Many users have been disconcerted by the terminology, the different ways vendors present virtualization, and the various technical implementations of storage virtualization. In addition, the two alternative approaches to virtualization on the network level (in-band and out-of-band technology) have resulted in controversial public discussions between proponents of the two technologies about which of them is superior. Needless to say, this situation does not contribute to the clarity end users need or address their fears and concerns.
SNIA has undertaken to clarify terminology in general and to convey information to vendors and users. The SNIA Shared Storage Model (SSM), as shown on the left, was developed at the very beginning of this project and has turned out to be a great help. The model offers a general structure for describing various architectures for accessing storage systems and can be considered an architecture vocabulary. It shows differences between different approaches without evaluating them per se. As a result, it lets vendors present their solutions-including any architectural differences-more clearly and gives users a better understanding of vendor offers.
Frank Bunn is a member of the Storage Networking Industry Association Education Committee (www.snia.org). He is also solutions marketing manager at Veritas Software Germany (www.veritas.com/de).