Server virtualization: Storage-based performance 'gotchas'

By Marc Staimer

-- Server virtualization has become an irresistible force sweeping into the world's data centers. With compelling cost and management savings from server consolidation, server virtualization's future seems secure. Or is it?

It is not uncommon for system administrators to find stunning application performance degradation when moving from the physical world to the virtual world. Invariably, the application performance drop-off shows up after the pilot has moved to production. There is  significant frustration in the efforts to fix it. Many of the problems, and the answers, are within the SAN-based storage.

There are four bottlenecks that can and will degrade virtual server application performance if not managed correctly:
--Oversubscription within the virtualized server;
--Oversubscription within the disk drives and target storage systems;
--Oversubscription in the SAN fabric; and
--Oversubscription at the target storage ports.

Oversubscription means that the amount of potential bandwidth assigned to a given port or device is greater than the bandwidth available. Oversubscription takes advantage of statistical probability: It is highly unlikely that all of the users or applications using the bandwidth will do so at exactly the same time. This allows for much higher utilization of assets and significant cost savings from fewer idle assets. It also makes huge economic sense.

The downside of oversubscription is the risk that users and applications will concurrently attempt to use all of the assigned capacity, resulting in significantly reduced performance. The risks are generally low, if there is not too much oversubscription. And there’s the rub. The cumulative multiplying effect of each level of oversubscription dramatically increases the probability of that downside risk.

A deeper examination of each of these oversubscription bottlenecks shows how.

Oversubscription: virtual servers
Oversubscription at the server is how server virtualization works. Too much oversubscription occurs when there are too many guests and applications competing for the server resources. One factor that complicates just how many is too many is the resource intensity of each application.

A second factor is the hypervisor's storage virtualization layer. This is where the LUNs assigned to the physical server are carved up by the hypervisor into virtual LUNs. The assigned target LUN in a traditional SAN storage system is tied to a specific number of drives in a RAID group (usually no more than eight). Whereas the physical world has unique LUNs for each server, the virtual server world has multiple virtual machines accessing the same LUN (meaning the same disks) at the same time. This is compounded by oversubscription at the queues.


This article originally appeared on Virtual Strategy Magazine's site: www.virtual-strategy.com, an online publication dedicated to covering virtualization trends, technologies, and products. InfoStor has a content exchange agreement with VSM.

Oversubscription: drives and targets
Each disk drive has a limited queue depth that allows multiple commands to stack up before a busy signal is sent back to the storage system. The storage system itself also has a limited queue depth before it sends a busy signal back to the application. The queue depth for a Fibre Channel or SAS drive is 256 to 512. The queue depth per SATA drive is at most 32 and more often than not, 0. (Thirty-two requires command queuing in the disk controller, which is atypical in SATA drives.)

This means that LUNs drawn from SATA disk RAID groups are far more likely to have busy contention than RAID groups with SAS or Fibre Channel disks. Even then there can be disk contention if there is a high number of IOs or throughput-intensive guests on the hypervisor.

Oversubscription: the fabric
SANs are by design oversubscribed. Best practices call for an average of 8:1 ratio of initiators from servers to target ports on storage. Higher I/O or throughput-intensive application servers require a lower oversubscription ratio. Lower I/O application servers can have a much higher oversubscription ratio.

When physical application servers are consolidated through server virtualization, and if the SAN is not re-architected to reflect virtual server oversubscription, then there will be a much higher probability of application performance degradation. Poorly engineered SAN fabric oversubscription will lead to significant fabric blocking.

Oversubscription: target ports
Just as too much oversubscription within the SAN fabric can cause blocking that substantially reduces application-to-storage performance, so, too, can too much oversubscription to the target storage ports.

Oversubscription is not a bad thing, and in fact is very useful in increasing asset utilization and reducing costs. Unfortunately, too much oversubscription leads to bad consequences.

Marc Staimer is president of Dragon Slayer Consulting, in Beaverton, OR, a storage market analysis and consulting firm specializing in network storage and storage management for the end user and vendor communities. Most of his consulting is in the areas of strategic planning, as well as product and market development.


This article was originally published on September 26, 2008