An alternative to host/array-based volume management

Virtualization can provide better capacity utilization, while reducing management headaches and costs.

By Nik Simpson

Storage administrators continually face the challenges of volume management, which stem from a number of inherent limitations in disk technology:

  • Fixed size—disks are usually too small or too big for the proposed application;
  • Locked-in location—capacity is not necessarily where it's needed;
  • Slow performance—especially compared to multi-gigahertz servers;
  • Unreliability—crashes inevitably occur; and
  • Finite capacity—disks run out of space.

Several vendors offer host-based volume managers to solve these problems. However, even the most advanced tools require significant time to manage disks, which can interrupt applications and add to storage management costs.

Focusing on volume management may be part of the problem. To achieve a cost-effective approach to managing and growing capacity, storage administrators may need to take a closer look at the link between the file system and logical unit number (LUN) size.

Breaking that bond can free up unused capacity and reduce management costs and hassles. One approach is to move to a host- and array-agnostic allocation scheme, leveraging in-band storage virtualization.

What's wrong with host-based volume management?

Many companies rely on conventional volume management, and with IT budgets staying steady or shrinking, there may be little incentive for trying a new approach. What isn't always obvious is the cost of trying to scale as storage requirements grow to multiple terabytes shared between scores of servers and other devices.

The cost benefits of overcoming scalability limitations make a solid business case for re-examining volume management strategies.

Figure 1: Volume management is installed as a layer in the I/O driver stack, below basic operating system routines for block disk I/O and the file system.
Click here to enlarge image


Host-based volume management has a number of strengths and weaknesses.

Typically, volume management is installed as a layer in the I/O driver stack, below basic operating system routines for block disk I/O and the file system. The virtual LUN created from physical disks behaves like a single physical disk to the operating system (see Figure 1).

The volume manager solves the problem of making a single LUN from several physical disks, but it doesn't address

  • Increasing the size of the LUN without requiring a reboot;
  • Stretching the file system across a newly resized LUN without requiring a backup/restore operation; and
  • Recognizing an out-of-space condition before applications fail.

Solving these problems requires digging deeper into core functions of the operating system and possibly replacing components such as file-system drivers, block I/O handlers, and adding application-specific agents to monitor the use of disk space.

Conventional volume management, with its significant host-managed component, is sufficient in environments with a relatively small number of hosts, but as organizations deploy more and more servers, scalability issues can pose problems.

Allocation vs. utilization

Even if host-based volume management techniques were fully scalable, the real challenge of capacity utilization would remain unresolved.

To minimize the number of times a volume is expanded, storage is added in large increments, which wastes large amounts of capacity (see Figure 2).

Figure 2: The yellow line represents required capacity, and the area in red represents wasted capacity.
Click here to enlarge image


This capacity allocation scenario illustrates why typical IT environments achieve, on average, only 40% storage utilization.

Wasted capacity has a significant impact on the real cost of storage for several reasons:

  • 40% utilization means managing 150% more storage than is necessary;
  • Additional administrators are needed to manage underutilized capacity; and
  • Companies pay for additional capacity sooner, and more often, than necessary.

These issues all contribute to a management problem that, according to research firms, can cost companies as much as $6 for every $1 of storage purchased.

Go to the root

File systems are the primary obstacle to achieving maximum capacity utilization. Unfortunately, file systems care about the size of the LUN. It is impossible to create a 1TB file system on a 100GB LUN, leaving administrators with two possible courses of action:

  • Allocate all the storage the host will ever use when the host is installed. Using this approach, the file system can be created at full size even though most of its capacity will be wasted for most of the life of the host.
  • Allocate some of the storage when the host is installed and then allocate additional chunks of storage (and periodically stretch the file system) when the host uses up the current allocation. Using this approach, there is potential for application disruption, data loss, and downtime every time new space is allocated.

The first option is too wasteful and expensive, so most administrators adopt the second approach, which leaves the question of storage utilization unanswered.

Fool the file system


Figure 3: The grid represents a virtual LUN, and each colored block is physical storage allocated to block address within the LUN. Empty blocks occupy no disk space.
Click here to enlarge image

One way to break the link between the file system and LUN size is to "lie" about LUN size. For a file system that will need 2TB over the next 18 months, for example, there's no need to allocate that much capacity when the file system is created. But there is no reason why the operating system can't be told that the LUN is 2TB, so that appropriate file-system structures are created. Since the file system is already sized for a 2TB LUN, adding more storage will not affect it. Storage assets can be added to the LUN in very small amounts at frequent demand-based intervals to provide just enough storage, just in time. This approach requires no host-based software and no changes to operating systems or applications.

In Figure 3, the grid represents a virtual LUN, and each colored block is physical storage allocated to block addresses within the LUN; empty blocks occupy no disk space. The figure represents the LUN through several phases:

  • 1.A new LUN with no data—At this point, no physical storage resources are assigned to the LUN.
  • 2.The LUN with a file system created—Each red block represents a small chunk of space allocated to hold partition or file-system data written to the LUN.
  • 3,4.The LUN as more data is written—Most of the LUN remains unallocated and occupies no physical storage, but blocks of storage have been allocated to the LUN where required.

Many high-end disk arrays have proprietary implementations of this allocation scheme.

Figure 4: Diagram shows three servers with a total of five LUNs assigned. Each LUN draws its storage allocation from the array.
Click here to enlarge image


Figure 4 shows three servers with a total of five LUNs assigned. Each LUN draws its storage allocation from the array. This approach brings potential benefits, such as the following:

  • No host software is required. The operation is independent of the host operating system, simplifying management and ensuring equal access for all servers; and
  • Incremental storage allocation maximizes capacity utilization within the array.

But a purely array-based approach has limitations:

  • Physical capacity is limited by the array,
  • so the total storage allocated from the array cannot exceed the array capacity because there is no way to handle overflow;
  • Implementing the processing power necessary for this function in an array is expensive; and
  • Each vendor's approach is proprietary, limiting the choices for future expansion to other vendors' arrays or presenting a challenge to managing multiple vendors' products within a storage area network (SAN).

The final piece of the puzzle is storage pooling based on storage virtualization. This allows hosts to share a storage pool consisting of multiple physical arrays, which removes the capacity limit (see Figure 5).

Figure 5: Storage pooling, enabled by virtualization, allows hosts to share a storage pool consisting of multiple physical arrays.
Click here to enlarge image


The combination of demand-based allocation and virtualized storage pooling can solve the following problems:

Physical capacity is unlimited, and new arrays can be added to the pool when needed, eliminating the overflow problem;

  • Processing power can be offloaded to inexpensive commodity servers; and
  • The virtualization of storage ensures allocation capability can be applied equally to any storage array, regardless of vendor.
  • As a result, moving allocation out of the array and off the host provides a scalable alternative to conventional volume management techniques. This approach can offer as much as 80% to 90% capacity utilization with minimal administrative overhead, which eliminates the guesswork of estimating the amount of storage a server will need.

In-band storage virtualization can enable the implementation of host- and array-agnostic, just-in-time allocation. Most of the major storage vendors have in-band storage virtualization products. With this in mind, you can begin investigating this technology to potentially double capacity utilization and simplify storage management.

Nik Simpson is a product marketing manager at DataCore Software (www.datacoresoftware.com) in Ft. Lauderdale, FL. He can be reached at nik.simpson@datacore.com.

This article was originally published on December 01, 2002