"Virtualization" is one of the most confusing and misapplied terms in the storage industry. Complicating the matter is that it can be referred to as either storage area network (SAN) virtualization or storage virtualization. And virtualization can reside at the host, network, or disk-array level. In addition, each vendor says its method is best and will win out over all of the others.
Making the wrong choice on virtualization technology could cost you significant time and money. Surprisingly, to choose the right virtualization technology, you might want to use a relational database design technique called normalization.
The most common types of virtualization are host-, network-, and array-based. In host-based virtualization, a software agent runs on each server in the SAN. This software, in theory, manages the storage resources and allows sharing of those resources between the different servers on the SAN.
Network-based virtualization is accomplished by putting a hardware device with virtualization software between the server and the storage. This approach is referred to as in-band virtualization. All data that travels on the SAN between servers and the storage must pass through this device, which recognizes all of the storage and servers and presents the storage in logical volumes to the servers. Originally championed by a number of small companies such as DataCore, FalconStor, and StorageApps, this approach has since been adopted by a variety of vendors, including Fujitsu Softek, Hewlett-Packard, Hitachi Ltd., and IBM.
The third-and most common-approach is array-based virtualization. With this technique, an external disk array uses internal software to virtualize the disks. This approach was originally popularized by EMC, although other vendors such as IBM and Hitachi Data Systems now have many of the same features.
Which of the three virtualization approaches is the "right" choice? In other words, which is the most cost-effective approach that allows you to use the investments you already have, scale for future growth, and simplify storage administration?
A little history
To answer that question, consider the history of SANs in data-center environments. SANs have their roots in the mainframe environment, where storage eventually became too unwieldy to manage since it was internal to the system. To address this problem, storage was attached externally, which only solved part of the problem because the storage was still logically associated with a single mainframe. This led to the development of ESCON directors, which are physical devices that sit in the data path and allow a storage array to be shared among multiple mainframes (or a single mainframe to access multiple storage arrays).
The open systems model has followed the same path to this point. However, although this model generally works fine in the mainframe world, it breaks down in the open systems world. The explanation for this is best given using terms borrowed from the relational database realm.
In the mainframe world, the relationship between mainframes and storage arrays works because a "one-to-many" relationship exists. The "one-to-many" term originated in relational database system design and expresses a relationship between one object and its many attributes, or one operating system and the many operations it manages.
In the mainframe world, the "one" in the "one-to-many" is the operating system, and the "many" is the storage arrays.
In the mainframe environment, the one-to-many relationship works reasonably well because there is a single logical operating system on all of the mainframes managing the many storage arrays. The operating system is intelligent enough to manage the storage no matter where it resides in the mainframe "SAN."
However, in open systems SANs, multiple heterogeneous operating systems may have to connect to multiple heterogeneous storage devices. Multiple providers of switches/directors further compound the complexity, resulting in a management nightmare.
Now let's translate this scenario into database terms. This nightmare reflects a "many-to-many" relationship, or a situation where many operating systems have many types of storage.
From the view of a database administrator, this is an unmanageable scenario. However, there is a relational database technique called normalization that solves this problem.
Normalization in a relational database environment requires the creation of another table in order to create a "one-to-many" relationship. This new table converts the previously unmanageable data into a manageable format.
This same technique can be applied to a SAN to simplify management. The result is not a new table but, rather, a new network layer in the SAN. This network layer is the network-based virtualization model.
Let's apply this to the open systems environment, where you may have many different servers and operating systems. Introduce a new device in the data path at the network layer so that all data traffic passes through it. Configure the device so it sees all of the servers, and all of the servers can discover it. This creates a one-to-many relationship on one half of the SAN.
On the storage end, you connect this device, or appliance, to the many types of storage devices. Again, configure the device so it sees all of the storage, and all of the storage can see it. This creates a one-to-many relationship on the other half of the SAN.
These one-to-many relationships make it much easier to manage the SAN. Using this proven relational database technique, you can see why the network-based virtualization strategy is usually the most logical choice for SAN management. It also helps to explain why major vendors are adopting this method.
In contrast, host- or array-based virtualization may be propriety, expensive, and difficult to administer and scale. Network-based virtualization should ease SAN management, while also opening up new ways of thinking about storage and storage networking. But perhaps more important to administrators, it will allow them to more effectively use what they already have and will also save money in the long term.
Jerome M. Wendt is a senior SAN analyst at First Data Corp. (www.firstdata.com), an electronic commerce and payment services firm with headquarters in Denver, CO.