Virtualization techniques can be applied to non-disk (e.g., tape) environments and to NAS and SAN storage networks.
BY JOHN MAXWELL
Storage virtualization has many possible applications in both traditional architectures and storage networks. The first article in this series discussed virtualization in open systems, direct-attached disk storage environments (see , September 2001, p. 34). This article-the second in a series of three-broadens the discussion to include non-disk storage as well as storage area network (SAN) and network-attached storage (NAS) environments.
Virtualization in non-disk environments opens interesting possibilities for tape sharing and for reducing disk-based storage through hierarchical storage management (HSM). Both of these approaches can be implemented easily with traditional storage architectures and offer immediate advantages in many environments.
Tapes are essential in most backup-and- recovery processes and in disaster-recovery applications. Tape offers a relatively inexpensive way to store multiple copies of large quantities of data and can be duplicated and stored off-site for protection against site-specific disasters.
Although tape media is inexpensive, managing and tracking tapes add to the total expense of tape. Although they are expensive investments, libraries and other robotic devices can make long-term sense by automating administrative activities related to tapes (e.g., mounting, dismounting, labeling, and bar coding). However, the costs of these devices can be significant. Storage virtualization technologies can help leverage these assets while simplifying storage administration.
One problem companies face is simply the proliferation of tape devices. Backing up a critical system, for example, generally requires high-speed, high-capacity tape systems to speed the backup process. With many critical systems, putting a high-speed drive on each server can run up the storage hardware budget.
Databases pose their own challenges, as they typically require high-volume backups with small backup windows. To handle this environment, companies typically increase their investments in tape devices by
- Putting the highest-capacity and highest-performing (most expensive) tape drives on those servers;
- Using parallel backup streams or multiplexing to keep several tape drives running at capacity during a backup; and
- Using robotics and libraries to alleviate the manual work of changing and tracking tapes.
With multiple critical database servers, these strategies can quickly become very expensive. Using a backup server to manage backups centrally reduces the number of tape drives required but sends a considerable amount of data over the LAN to the server. Most organizations strike a balance, with critical and high-volume servers using dedicated devices, sometimes managed and scheduled by a centralized backup server.
Storage virtualization offers a solution by creating a common pool of tape drives that can be shared among multiple servers. These drives may be drives within a tape library, effectively allowing multiple servers to share the library for better utilization.
The tape virtualization software (typically part of a broader backup solution) tracks which server "owns" the tape drive at any point and can allocate tape drives dynamically, based on need.
Multiple servers can share a single tape library over a storage area network.
Tape sharing is an example of virtualization because the software presents a logical pool of tape drives to the servers, abstracting the details of exactly where the drives reside. Tape drives are allocated and de-allocated dynamically; you don't have to dedicate drives to specific servers or permanently "carve up" a library into multiple parts. If a server has a large backup, it can potentially allocate a large number of the drives and then release them when it is done.
The most common way to implement tape sharing is with a SAN and Fibre Channel switches. (Using a SCSI switch is also possible but limits the number of servers that can access the tape drives.) This architecture gives many servers access to high-end tape libraries (see diagram). It also potentially reduces traffic on the LAN as the backup travels over the SAN to the tape device.
Creating a storage pool
When system administrators are evaluating how much storage they manage, they typically think of disk storage. But if you also include storage on tape or optical disk, then the potential storage capacity of a system or a network is enormous. This can work to your advantage if you virtualize storage across disk, tape, and optical disk.
The overall idea is fairly simple: Move infrequently used data to high-capacity, lower-cost storage (tape or optical disk). Keep pointers in the original disk locations, and make the process of retrieving the data transparent to users. To users or applications, the files look as if they are all stored locally; users and applications do not need to know whether a file is on tape, optical disk, or disk. Retrieving a file from tape involves a longer access time but otherwise is transparent to the user or application requesting the file.
Again, the concept is not new: HSM has been in use in the mainframe environment for a long time, to make better use of costly direct access storage devices (DASD).
How is this relevant in today's open systems storage environment, with disk hardware prices much lower than even just a few years ago? Although the purchase price per megabyte for disk is much less than it used to be, using HSM can save both time and money.
The true cost of online storage must reflect the administration of that storage. HSM can reduce the total amount of disk storage that you have to manage. For example, by integrating HSM with backups, you can reduce the total amount of data managed on each backup.
HSM can eliminate the "out-of-space" error conditions dreaded on many systems (the fear that causes administrators to over-provision storage dramatically in many cases) and creates a virtually limitless pool of storage by intelligently sending infrequently used data to tape. An HSM solution should monitor available capacity and only migrate data when a specific threshold is reached.
In addition, the software must be smart about what data it migrates and allow fairly flexible policy definitions. Integration with backup utilities is essential to ensure adequate protection without retrieving migrated data for every backup.
The concept can be applied network-wide or for specific systems such as file servers. In a limited scope, for example, you could migrate old e-mail attachments from a Microsoft Exchange server to secondary media. The attachments remain available to users, but you free up room for new storage and reduce the backup volume. HSM offers another example of storage virtualization that can be applied today, without any change in system architecture.
Virtualization for NAS
A NAS appliance attaches to the LAN and offers file services to Windows and Unix clients anywhere on the network. The NAS appliance itself is a good example of virtualization; for either Windows or Unix clients, the files are presented as local files, while the storage itself is consolidated on the network.
All of the benefits of logical volume management, discussed in the first article in this series, apply to NAS file systems. These benefits include the ability to create highly available, high-performance data storage from combinations of physical storage and to manage that storage more effectively.
You can apply additional virtualization technologies to NAS appliances to further refine storage capabilities. For example, NAS appliances holding critical data can be replicated transparently over IP networks to distribute content and balance loads throughout the enterprise. In this sense, the logical file system, as presented to users and applications, is actually replicated physically throughout the enterprise. The means of the replication and the storage hardware used to replicate the data are transparent to users.
Virtualization in SANs
One of the major promises of the SAN architecture is the simplification of storage management. A SAN offers a many-to-many linkage of servers with storage. The concept is very simple: If a server needs more storage, it just accesses more from the SAN. If multiple servers need to access the same data, they can point to the same copy on the SAN.
In actuality, realizing the true benefits of SANs requires storage virtualization technologies that separate the physical location and specifics of the storage from the logical presentation of storage to the server. With virtualization, the SAN can represent a pool of spare storage available for servers that need it. Without virtualization, you still need to specifically allocate and track each server's storage.
This is why the storage virtualization discussion becomes so critical in the SAN space. There are many approaches to virtualization in a SAN, some evolving from open systems virtualization methods and others designed as new approaches in the SAN hardware itself. Discussions and articles about SAN-based virtualization typically center on where and how virtualization takes place. Does it occur in distributed, server-based software? Does it occur in an appliance in the data path (in-band virtualization) or in a SAN hardware component, or even a storage device? What are the performance and management implications of each approach?
Before delving into these issues, let's look at what you should expect from SAN-based virtualization. Whether you're implementing a SAN now or planning one for the future, having a good understanding of the possibilities and your expectations can help you choose between the many alternatives and cut through the vendor noise in the market.
Centralized management: The SAN should simplify storage management, not make it more complex. A SAN virtualization solution should be able to discover what storage is available on the SAN automatically (discovery and inventory reporting) and simplify the process of mapping physical storage to logical storage.
Capacity reporting for logical devices: The virtualization solution should be able to monitor capacity of the logical devices throughout the SAN as well as actual storage usage for various applications and servers.
Policy-based management: The virtualization layer should simplify management though policy- and event-based management. Administrators define acceptable boundaries and specific alerts or actions to take when performance or status exceeds tolerance levels.
Provisioning: Virtualization should make it very easy to allocate storage to servers and applications, grouping storage resources and allocating storage quickly and easily.
Security: Virtualization cannot compromise security. The virtualization environment must be able to provide secure ownership of storage without another server corrupting data. LUN masking and data-path zoning are two approaches to ensuring secure access to arrays and tape devices.
Multi-vendor support: If we have learned anything from the client/server experience, it is that locking in to a single vendor, whether for software or hardware, is nearly impossible and probably undesirable. A SAN should be able to support both servers and storage equipment from multiple vendors, leaving you free to swap or upgrade storage and devices as necessary.
Data sharing: If you need the ability to consolidate storage in the SAN, with multiple hosts accessing the same copy of the data, then you need data-sharing capabilities within the SAN. These can be at the file-system level (shared file systems) or at a volume level for database or other non-file-system data.
The main benefits (simplified management and access to storage) are similar in both SAN and traditional storage architectures. Delivering these benefits in a SAN environment, with its many-to-many connections between servers and storage devices, is a challenge. This is the topic for the next article in this series, which discusses more of the details of virtualization within a SAN.
John Maxwell is vice president of product marketing at Veritas Software (www.veritas.com) in Mountain View, CA.