A look at in-band vs. out-of-band SAN virtualization, and the promise of storage virtualization for NAS servers.
BY MARK BUCZYNSKI
With a typical enterprise's data storage needs doubling every year, IT managers are constantly adding more storage servers and disks. Network-attached storage (NAS) and storage area networks (SANs) have evolved to meet this demand. However, scaling up capacity with these solutions almost inevitably leads to rising management complexity, user disruptions, and ever-increasing costs.
Storage virtualization is the latest approach to eliminating these issues and unifying storage management.
Although today's SAN-oriented virtualization solutions make it easier for storage administrators to manage individual storage pools, the ultimate promise of virtualization is to create one storage pool that can be seamlessly managed across platform types and dispersed geographic locations. With new NAS-based virtualization approaches, companies could further slash storage administration costs and complexity while eliminating user disruptions during scaling or storage management operations.
Designed thus far mainly for SAN environments, virtualization simplifies server administration by mapping all disks connected to a server and presenting them as a single resource to the storage administrator. Virtualization decouples physical disk drives from logical disk operations to create a consolidated "logical" storage picture that is easier and more cost-effective to manage. Rather than having to administer data by working with each disk in a disk farm, virtualization allows IT managers to deal with multi-disk storage as a single logical entity.
Today's SANs often comprise heterogeneous collections of storage servers and disk arrays, and current virtualization solutions simplify administration of the disks or arrays connected to individual storage servers. Although many SAN equipment vendors are contemplating the idea of incorporating virtualization features into their products, today's solutions are usually deployed on a separate hardware appliance. These appliances create a logical view of a server's disks that is distinct from their physical makeup, so all disk drives are presented as a single resource to the server and the storage administrator.
There are two basic types of virtualization appliances: "out-of-band" (asymmetrical) and "in-band" (symmetrical). Each approach has its advantages and disadvantages.
With out-of-band virtualization, the appliance sits outside the data path (see figure, left). The out-of-band appliance enables virtualization of multiple disk farms for multiple storage servers on a SAN. The virtualization appliance supplies volume metadata, and the server uses that data to translate I/O addresses for the appropriate disk via the appliance.
Though virtualization can introduce latency into the storage access process, out-of-band virtualization alleviates most latency issues. By working outside of the data path, out-of-band virtualization appliances preserve near-native performance of the storage server.
To achieve out-of-band virtualization, how ever, IT administrators must install software on every server to be virtualized. Some out-of-band virtualization solutions require software on host bus adapters (HBAs) as well. Each time a server is added to the SAN, the administrator must add virtualization software and then reconfigure the virtualization appliance. The number of "touch points" increases as the SAN scales, creating an endless cycle of reconfiguration.
With in-band approaches, the virtualization appliance sits directly between the storage servers and the disk farm (see figure, right). In-band virtualization eliminates the need to have software on each server, but as a tradeoff, it can potentially present latency and availability issues.
In-band virtualization appliances are easier to deploy and manage because they are self-contained, enabling a lower total cost of ownership (TCO) than for out-of-band solutions. IT managers can add new storage servers at any time without installing or reconfiguring server software. Since an in-band solution sits directly in the data path between the server and its disk farm, however, it can affect storage performance, and it creates a single point of failure. It introduces additional latency in the data path due to the additional hardware the data must traverse between the server and its disks, and the wait time is increased while the virtualization software performs the various translations.
Still, in-band virtualization has gained some traction in the market because there are many applications that aren't affected by the additional latency it creates. Applications that support many users often issue a continuous stream of I/O requests to a storage server, but different applications have varied traffic characteristics, many of which are not affected by virtualization latency. For example, some applications have large volumes of small reads or writes, while others have smaller volumes of large reads or writes. The latency caused by in-band virtualization software is more likely to affect high- volume database applications that have more writes than reads. Over time, the potential for virtualization-induced I/O delays will continue to decrease as HBA and processor speeds increase and new technologies such as InfiniBand come onboard.
By all accounts, the highest cost of enterprise storage lies in its management. By offering a lower TCO, in-band virtualization often emerges as the better solution, as long as latency and availability issues can be addressed.
Extending virtualization to NAS
Ideally, it should be possible to extend the benefits of virtualization to NAS servers. There are several reasons why virtualization could deliver equal or even greater benefits in NAS environments than it does in SANs.
First, there's the disruptiveness of scaling NAS storage. NAS servers support end users and applications with file- oriented storage. Given the explosion in storage demand, adding new NAS servers is a way of life for IT managers, which leads to inevitable user disruptions during scaling and administration. Although adding a NAS server to a network is a plug-and-play operation, managing the new space presents a challenge.
When a new NAS server is brought online, re-allocating data to the additional space created with a new server is a highly disruptive operation. Since each NAS server has a separate file system, each application or user must be denied access as the new server is added and the file system data is redistributed to take advantage of the additional space. In addition, each user's shares or exported name spaces must be reconfigured and redirected to the new server.
With the quantity of data growing at such a fast pace, storage scaling is an ongoing process that forces regular outages. And even if individual storage servers can scale capacity, storage densities are always improving and disk upgrades necessitate outages. To double the capacity of an existing 18GB drive, for example, several steps must be performed to ensure data integrity. Before the 18GB drive is removed and replaced with a 36GB drive, the existing data on the old drive must be mirrored to another device. After the upgrade, the mirrored data might have to be returned to the new drive. In any case, user access would have to be reconfigured (either to the replacement drive or back to the new primary drive). These steps can take at least 30 minutes per disk.
A barrier to NAS virtualization is that historically, NAS has been viewed as lower-end storage. NAS servers have traditionally had relatively low capacities and performance. Most storage management vendors haven't considered virtualizing NAS because these servers haven't been able to support the large application I/O demands that would benefit most from virtualization.
While NAS servers are relatively easy to deploy, the growing number of NAS storage islands creates an endless spiral of rising management costs. By virtualizing NAS servers, the storage space could be presented and managed as a single pool as in SAN virtualization. This would greatly lower TCO while enabling companies to cost-effectively leverage such traditional NAS benefits as simple deployment, simple user access, and low-cost network connections.
Bigger isn't necessarily better
IT managers are always interested in mitigating the negative impact of scaling NAS storage. One approach to the scaling problem is to increase the capacity of NAS servers. New NAS servers offer 10 or more times as much storage capacity under a single file system, making it much easier to address a larger amount of storage. With a single server head fronting vastly more storage, however, problems can arise in areas of performance. No matter how great its performance, a single server can be saturated by I/O requests. Under times of heavy usage, the server's processor at some point may not keep up with I/O demand. In fact, in sustained heavy use periods, overall performance could actually degrade. These systems are bound on performance by network throughput, processor speed, and disk-access latency long before they are bound by their disk storage capacity. These performance concerns make them unsuitable for many types of applications that they were originally designed to address.
The solution to NAS virtualization would address the scaling, performance, and management problems that companies face today. Ideally, the solution would also address the drawbacks of out-of-band and in-band virtualization techniques.
There are several key requirements for such a solution. The first requirement for NAS virtualization would be to break the one-server, one-file system barrier. By using a distributed file system that allows multiple servers and their associated storage to be addressed as if they are part of the same storage space, companies could overcome the cost issues associated with scaling and could also address the drawbacks of in-band virtualization.
A single file system distributed across multiple servers would create a single logical storage pool. Whenever more space is needed, a new NAS server could simply be added and data could be re-allocated seamlessly. Individual disk upgrades would be just as invisible to users and applications.
In addition, using a distributed file system would allow companies to add NAS servers to improve application I/O performance along with storage capacity. With the space on all servers manageable as a single pool, each new NAS server would add new I/O channels to the storage pool. These new NAS servers would offer virtually limitless performance and capacity scaling.
A distributed file system could also minimize in-band virtualization concerns, because it would eliminate the need for a separate virtualization appliance. With virtualization occurring on each server, there would no longer be a need for additional translation via virtualization software on a separate appliance. I/O requests would be translated and mapped on the NAS server and sent directly to the disk farm. Furthermore, because a distributed file system would be natively present on every server rather than on a separate appliance, there would no longer be a single point of failure. Disk mirroring, replication, snapshots, and backups could be set up by using other parts of the storage pool that physically reside on different servers.
With a distributed file system, NAS servers could offer the benefits of virtualization without the compromises associated with SAN virtualization. NAS virtualization could offer low TCO while continuing to provide the ease of deployment and low-cost connectivity that has made NAS servers popular. Pooling NAS servers would enable enterprises to scale capacity far beyond that of even the largest servers today, and it would also allow companies to scale performance at the same time. By virtualizing the space on NAS servers, enterprises could greatly simplify storage administration and put an end to the spiraling storage management costs associated with today's solutions. A seamless NAS storage pool would give IT managers a robust and flexible solution to meet any of their future storage needs.
Mark Buczynski is senior marketing manager at Spinnaker Networks (www.spinnakernet.com) in Pittsburgh, PA.