The issue that inevitably creates confusion is how and where to implement virtualization within the storage area network.
BY LISA FORSYTHE AND MELISSA STEIN
Although commonly discussed only in the context of storage area networks (SANs), storage virtualization has a much broader potential application in the storage infrastructure. This article-the final one in a series of three-explores storage virtualization within the context of a SAN. Previous articles in this series focused on non-SAN implementations as well as other special-purpose applications such as tape sharing (see, September 2001, p. 34, and October 2001, p. 40).
Again, our approach is not to get bogged down in the details of implementation but to discuss some of the possibilities for virtualization that can be implemented easily today.
To put the discussion in the context of the many other articles and materials you may have read on the topic, we have to talk about terminology. But our goal is not to declare one approach for SAN virtualization a "winner." Instead, our premise is that there are many valid options for storage virtualization on a SAN. Your objective, if you are looking at new storage solutions, is to choose the one or more approaches that meet your needs, while ensuring that the whole storage infrastructure remains flexible and manageable.
Too many options?
Standards for SANs are still emerging. With so many components in the SAN (e.g., servers, storage hardware, switches, and routers), and with each vendor pushing its own vision for SAN virtualization, the discussions on the different options can quickly become overwhelming. Many people turn to analysts to define the terms and give them guidance on which option to choose and which path is ultimately the "right" selection for a SAN.
The fact that you have so many options is both bad news and good news. It's bad news if it confuses the whole issue of SANs or makes the process of implementing a SAN seem overwhelming. At the same time, it's good news because having options is what SANs are all about. One reason we adopt SANs is to consolidate storage so we can make better use of the storage we already have. We want to be able to throw all kinds of resources together into a storage network to create a flexible platform for current and future storage needs.
Figure 1: In in-band virtualization, a virtual appliance answers all requests for data. The data must travel through the appliance to the servers requesting it.
SANs can be used to achieve a number of different goals. To make an informed decision, you should evaluate the return on investment for the business need. Some of the common reasons for adopting a SAN include storage consolidation, re-deployment of existing storage devices, better storage device utilization, improved allocation and provisioning, and the ability to share data between multiple servers.
Different groups in an organization may want to implement SANs for different reasons. For example, a single department might need to share data between multiple workgroups and/or application servers, while a corporate-wide, IT infrastructure plan could call for centralized management and consolidation of storage. Obviously, these needs are supported by different budgets. The important thing in any SAN implementation is to identify what you hope to achieve with the SAN and to justify the investment from a workgroup and/or corporate perspective.
SAN virtualization implementations
Storage virtualization is one of the key benefits of a SAN. To consolidate data storage, you need logical storage pools of physical disks that are not constrained to the one-to-one allocation of storage device to server.
The issue that inevitably creates confusion is how and where to implement virtualization within the SAN. The logic (software) that manages storage virtualization can reside within the network fabric, within the firmware for networking devices, on a storage device within the virtualization pool, or on a server within the SAN.
If the virtualization software maps the logical to physical while systems are running, it is considered dynamic (instead of static) virtualization. We need dynamic virtualization to maintain flexible logical mapping and the ability to re-deploy or restructure the virtual pool online.
The simplest type of virtualization is host-based virtualization (the topic of the first article in this series) in traditional storage environments. In this configuration, data from multiple arrays and multiple vendors is presented "virtually" to applications on a single server. We'll discuss a host-based case below in which this model can be extended across multiple, homogeneous servers in a SAN.
If the data and the control information both flow through the same path, this is referred to as "in-band" virtualization. In an in-band solution, a virtualization appliance answers all requests for data. The data must travel through this appliance to the application servers requesting it. In Figure 1, "01" represents a request for data. The virtualization appliance looks for that data, which may reside on different physical devices. Once it locates the data, it then passes it back to the application server to meet its request.
In an out-of-band solution (see Figure 2), the virtualization appliance sits outside of the data path. Application servers request control information or metadata from the appliance. Once the information on the data location is received, the applications servers can then access the storage directly.
Figure 2: Out-of-band virtualization solutions leverage the fast connection power of the SAN switch and do not require data to pass through them.
Each approach has its own advantages and disadvantages. In-band solutions offer centralized administration from a single console and are storage vendor-indepen dent. However, the appliance can potentially become a performance bottleneck, as all data must run through it. The number of ports limits the scalability of the total SAN solution, and as a single point of failure, you must make the virtualization appliance highly available.
Out-of-band solutions leverage the fast connection power of the SAN switch and do not require data to pass through them. However, these approaches require software on the application servers for virtual device communication, which may increase server CPU utilization and make procedures such as OS upgrades more challenging. In addition, security is a concern due to the possibility of a rogue server destroying data by coming into the SAN without a driver.
Luckily, you don't have to commit to one approach or the other; you must be able to change SAN solutions and reconfigure hardware as needed. Many analysts advocate using a mix of in-band and out-of-band solutions, based on application needs.
The key in implementing a SAN is to look carefully at what you expect to achieve with the SAN and then be sure that you solve those problems without boxing yourself in. In other words, leave your options open. The future should hold robust, cost-effective, multi-vendor SANs that support a wide variety of virtualization approaches and solutions.
The whole virtualization discussion becomes more interesting if we put aside the idea of the monolithic, all-or-nothing SAN and look at some SAN/virtualization approaches that meet specific business information needs. Let's examine a few of these approaches: consolidating storage in a SAN appliance, sharing data in a clustered file system, and implementing widely distributed SANs for sharing data across geographic locations.
The block-server appliance
For many, the main purpose of a SAN is to consolidate and pool storage for ease of management and better utilization, resulting in cost savings. Yet many organizations are not ready to build enterprise-wide SANs. What if you could simply "plug in" a SAN to your network, without a large investment and storage reconfiguration?
A block-server appliance for a SAN lets you do just that. This in-band virtualization solution consolidates block data and makes it available to servers on the network. To those servers, the data seems like direct-attached storage and is managed as such in the native application environment.
Because it's distributing data at a block level, the appliance can serve up any type of data, offering storage resources to Unix and Windows servers alike. (Note that these systems do not share the same data in this configuration. They merely share the storage resources. Once the storage is formatted for a specific OS, it cannot be used by an incompatible system without reformatting and re-allocation.)
To retain flexibility, the appliance should not be a proprietary hardware system, but should be assembled from easily available and/or existing components. Furthermore, you should be able to add storage to the appliance as needed to increase the storage pool, regardless of vendor type. You also need to make the appliance highly available with a fail-over cluster and redundant storage techniques.
The SAN appliance concept creates a SAN specifically for consolidating storage to reduce costs. Attaching more disks simply expands the virtual block server. You can re-deploy existing storage devices on the SAN appliance, making storage capacity available to a large number of servers for better storage utilization. This cost-effective approach would suit companies trying to accommodate rapidly growing storage requirements.
Sharing data in a SAN
Clustering is an application driving SAN adoptions. Clusters are used for availability and scalability. In either implementation, you have two or more interconnected servers with clustering software that monitors shared resources in the cluster and also manages fail-over when necessary.
In a typical availability cluster, all servers share access to the same storage so that if one server is unavailable, another server can take over its applications. A simple, two-node cluster can be built quite easily with switched SCSI devices. As clusters grow in size, with possibly dozens of servers and cascading fail-over capabilities, SANs become an attractive way to manage the shared access to storage.
Clustering for scalability is more challenging, as it typically requires that multiple servers have access to the same data at the same time. A Web farm is an example of a scalability cluster: When performance slows, the first reaction is to add another Web server to the farm. Multiple Web servers may access the same site data, taking advantage of the processing power of all of the nodes in the cluster.
This kind of data-sharing cluster requires software that understands that multiple servers are accessing the data. This is a special case of host-based virtualization extending across multiple servers. However, this setup only works in homogeneous operating system environments. You need a high-level application that understands that the data is shared, managing contention and ensuring data integrity. A parallel, shared database environment is one example. If you don't have that logic within the application, it has to happen at the storage virtualization level, with an application such as a clustered file system that employs its own locking for integrity.
Again, this is a type of SAN implementation that has a more focused purpose in the IT infrastructure, as opposed to handling general storage requirements. The cluster may serve a single department or application such as a Web farm, clustered database, or development application.
Let's change our perspective to storage virtualization beyond an individual SAN and consider the problem of distance.
Wide area virtualization
A Fibre Channel SAN can support devices at much greater distances than SCSI-attached storage. This aspect of SAN technology makes it easier to manage distributed storage as a single, virtual entity. But for many companies, the need for data distribution extends beyond the physical limits of Fibre Channel. In this case, you're replicating data from one SAN to a remote network, which may be another SAN. Consider global companies with operations spread across continents. If a teller in Asia wants to reference a financial record in New York, he or she should be able to do so, simply and transparently. In other words, virtualization should extend to WANs.
Beyond the SAN itself, you need to build virtualization solutions that reach from SAN to SAN through WANs. The problem with long distances is latency-locally referencing remote data introduces significant delays. Global businesses can address this through replication, allowing local access to data. In other words, you can extend the concept of virtualization outside of the SAN itself and around the globe. Linking SANs with replication and global clustering/fail-over provides a good basis for comprehensive disaster recovery and business continuity planning.
In this three-part series of articles, we've discussed the applications of storage virtualization in traditional storage environments, in SANs, and beyond SANs in wide area networks. The objective has been to focus on the possibilities of virtualization, instead of the specifics of implementation. With a clear understanding of the potential applications for virtualization, you should be able to make better storage decisions.
Whatever your storage needs and infrastructure requirements, explore your options before taking action. You may want to enlist a SAN assessment service to help you architect a SAN solution that best meets your needs, leveraging the expertise of those who have already built a great number of SAN virtualization solutions. Be sure that your eventual solution fits your needs and budget, optimizes existing hardware investments, and retains the flexibility to adapt and change over time.
Lisa Forsythe is director, product marketing, and Melissa Stein is senior product marketing manager, at Veritas Software (www.veritas.com) in Mountain View, CA.