Storage virtualization becoming a reality

It's been a long time coming, but there are now plenty of viable virtualization products and early adopters of the technology can attest to the benefits. But there's a big battle brewing for control of the enterprise.

By Steve Norall

—Storage virtualization became a much-ballyhooed term in 2001. Back then, it was the hot technology—the panacea for all storage management woes. Then it became the goat. In fact, many vendors stopped using the V-word altogether to describe their products.

The technology adoption cycle for storage virtualization has proven to be particularly steep and challenging. However, this phenomenon is not uncommon in the storage market. Disruptive technologies take time to make their way from the early adopter stage to mainstream deployment. Unfortunately, along the way the marketing hype outstrips the early capabilities and benefits of the technology. Ultimately, this leads to end-user disillusionment and backlash. The adoption cycle for storage virtualization has proven no different.

State of the market
However, the storage virtualization market is now making a comeback. Over the past few years, the technology has been vetted and proven to deliver demonstrable value to end users. With a spate of end-user deployments and product introductions, the winds are finally at the back of virtualization again. Reflecting on how the market has evolved to date, we make the following five observations about what has—and still has not—changed:

End users understand the value now: The early days of the storage virtualization market were characterized by empty promises. Vendors over-hyped the capabilities of their products, and users were uncertain how to utilize the technology to deliver true business value. However, the mindset of the end-user community has matured. The early adopters of virtualization have a clear understanding of what the technology can do for them. They have used the technology for specific usage models that enable greater storage management efficiencies and have realized tangible business value.

Based on our conversations with end users, online, non-disruptive data migration is the number-one driver behind virtualization adoption. Users are implementing storage virtualization to migrate off-lease arrays, upgrade their storage systems, and gain more maintenance control over their storage infrastructure without downtime. In addition, users report huge improvements in storage provisioning, cutting provisioning times from days to hours. Interestingly, end users believe virtualization is the pathway for increasing storage utilization rates and implementing a tiered storage strategy. However, in general, they are not using these benefits as the initial business justification.

The early virtualization survivors are prospering: The storage virtualization market has experienced a Darwinian shakeout. Many of the early pioneers were forced to reposition their offerings, go out of business, be acquired, or diversify their revenue stream outside of their core virtualization offerings. The early pioneers, at least those that survived, now have mature offerings and are beginning to show significant market traction. For example, IBM recently announced its 2,000th virtualization customer, and its fourth-generation products are shipping in volume. FalconStor, DataCore and StoreAge (which was acquired by LSI Logic) have also emerged as viable suppliers of virtualization technology.

EMC and Hitachi Data Systems (HDS) have embraced virtualization: In addition to IBM, market leaders EMC and HDS are endorsing storage virtualization technology and have viable products. HDS launched its Universal Storage Platform (or USP, which is part of HDS's TagmaStore line of disk arrays) in September 2004, and EMC followed with its Invista virtualization platform in 2005.

End users are no longer being bombarded with mixed messages as to whether virtualization is worthwhile. Now, all the major vendors embrace the concept of virtualization, and the debate has shifted to which vendor's approach is the most compelling for a specific user environment. This is a huge step forward in terms of market maturity. To that end, we believe that the fact that three of the four largest storage vendors (EMC, HDS, and IBM) now have virtualization offerings is a sign of a rapidly maturing marketplace.

Intelligent fabrics are still in their infancy: Many prognosticators have predicted that intelligent switches would become the primary conduit for delivering storage virtualization capabilities. Clearly, EMC is betting big on the adoption and success of intelligent switches since Invista works exclusively with such devices. However, the fact remains that the number of intelligent switch ports shipped is still infinitesimal compared to the total number of switch ports sold each year. Other than Cisco, the major switch vendors (Brocade and McData) have been late to market with intelligent switches. Moreover, the Brocade-McData merger promises to complicate things as the two companies begin the difficult process of rationalizing their product lines and focus their energies inward instead of externally evangelizing intelligent switching capabilities. As a result, the intelligent switch market remains in an embryonic state, despite five years of marketing evangelism.

Hewlett-Packard is MIA: The only major storage player without a stated storage virtualization strategy and product offering is Hewlett-Packard. HP acquired StorageApps in 2001 and began offering the company's product as CASA. However, in 2004, EMC won a patent infringement lawsuit barring HP from selling CASA. As a result, HP, which had been one of the first major storage vendors to embrace virtualization, has been effectively shut out from the market. It remains to be seen what HP will do in the virtualization space. Currently, HP resells Hitachi's USP through an OEM agreement, but HP has not refreshed its storage virtualization strategy beyond that arrangement. Given HP's breadth of end-to-end technology in the IT stack, it would make sense for the company to explore a "portfolio-based" approach beyond just the storage controller-based approach they currently embrace via the HDS deal. HP's taking ownership of a network-resident virtualization technology would make good sense if the company is interested in controlling its virtualization destiny on its own terms and providing its customers with flexibility in approaches.

Four architectural approaches
To understand the differences in virtualization approaches between the different vendors, end users need to understand the different architectures and their strengths and weaknesses. A storage virtualization purchase decision is as much about investing in a particular vendor's philosophy and technology approach as it is about buying a particular product and feature set.

In a virtualized SAN fabric, there are four ways to deliver storage virtualization services: in-band appliances, out-of-band appliances, split path architecture for intelligent devices (SPAID), or controller-based virtualization. Before we delve into the architectural specifics, it is important to understand that a typical I/O path can be deconstructed into three separate paths or streams: the metadata, control, and data path. The metadata path controls mapping between virtual volumes and physical devices. The control path maintains the interface between the metadata path and the data path software. Lastly, the data path contains the actual information that needs to be transmitted between host and storage (see table below).

  • In-band appliance: A network-resident appliance processes the metadata, control, and data path information. In other words, all three are "in the path."
  • Out-of-band appliance: The metadata management and the control path processing are performed by a separate compute engine or appliance, distinct from the compute engine that processes the data path information. Software agents must be installed on each host to split the metadata and control data from the data path. The metadata and control information are forwarded to the out-of-band appliance for processing, while the host conducts high-performance direct transfer of data to and from storage.
  • SPAID: A SPAID system leverages the port-level processing capabilities of an intelligent device to split the metadata and control information from the data path. Unlike an out-of-band appliance, where the paths are split at the host, SPAID systems split the paths in the network at the intelligent device. SPAID systems forward the metadata and control path information to an out-of-band compute engine for processing and pass the data path information on to the storage device. Thus, SPAID systems eliminate the need for host-level agents. Typically, a SPAID-based virtualization product works in conjunction with an intelligent switch or is integrated with a purpose-built appliance (PBA).
  • Controller: Array controllers have been the most common layer where virtualization services have been deployed to date. However, controllers typically have only virtualized physical disks internal to the storage system, although this trend is changing. A twist on the old approach is to deploy the virtualization intelligence on a controller that can virtualize storage internal and external to the storage system. Like the in-band appliance approach, the controller processes all three paths—data, control, and metadata.

Vendor landscape
Through the boom and bust cycle of storage virtualization, the vendor landscape has winnowed into three separate architectural camps: in-band appliances, SPAID systems, and controller-based virtualization. Each of these approaches is championed by one of the major storage virtualization vendors. Out-of-band appliance vendors, the other architectural approach mentioned above, have not experienced significant customer traction, and no major storage vendor has based its storage virtualization product on that architecture. Ultimately, we see the storage virtualization market shaping up to be a battle of architectures and market power primarily among EMC, HDS, and IBM.

In-band appliance virtualization was the first approach. Early vendors such as DataCore, FalconStor, and IBM delivered in-band appliances based on Intel hardware. The appliances process the control, metadata, and data path together. In the early days, the scalability and performance of in-band appliances were questioned. However, those concerns have been eliminated as the remaining in-band vendors can cite thousands of real-world, enterprise-class deployments. The vendors and their products in this architectural camp are the most battle-tested, customer-proven, and feature-rich of the three architectural categories.

IBM has emerged as the leader of the in-band virtualization camp. IBM's SAN Volume Controller (SVC) began shipping in 2003. The company boasts a fourth-generation product and a feature-rich offering. IBM has demonstrated through its more than 2,000 customer deployments and its benchmark results that an in-band approach can scale and can support the largest enterprise SAN environments.

An interesting emerging in-band virtualization player is Reldata, which sells an in-band gateway that provides a rich set of storage virtualization services supporting both file and block transfers (NFS, CIFS, and iSCSI, with support for Fibre Channel on the way).

The SPAID virtualization camp is betting on the rapid adoption of intelligent switches and PBAs. This category has long been hyped as the future of storage virtualization. However, the market for intelligent devices has been slow to materialize. There are signs that this may be changing, but the adoption rate is still undetermined.

EMC's Invista is the poster child for SPAID. At present, EMC has chosen to bet exclusively on intelligent switches as the hosting platform for Invista. Released in 2005, Invista was the first SPAID product from a major vendor.

Invista is an out-of band virtualization controller that processes the control and metadata paths, while relying on the port-level processing of the intelligent switch to transfer the data path without overhead. To date, EMC has delivered or announced intention to support intelligent switch platforms from vendors such as Cisco, Brocade, and McData. However, EMC has been hamstrung by the lack of market penetration of intelligent switches.

In addition, StoreAge and Incipient have brought to market SPAID-based storage virtualization offerings that also work with intelligent switches. StoreAge, the first vendor to ship a SPAID-based solution, was recently acquired by LSI Logic. LSI Logic acquired StoreAge to provide rich storage applications across its own storage system product line.

Incipient is an "independent" SPAID vendor. Incipient's NSP began shipping with Cisco's intelligent switches in October 2006.

HDS is the leader in the third architectural camp—controller-based virtualization. Hitachi unveiled its TagmaStore Universal Storage Platform (USP) in September 2004. Rather than fielding a virtualization appliance or intelligent switch, HDS chose to embed virtualization intelligence as part of its disk controller and allow the TagmaStore controller to virtualize and manage heterogeneous pools of storage attached internally or externally. This was a novel approach since previous disk controllers had only managed their own storage devices.

TagmaStore delivers rich storage application functionality and 2.5 million IOPS of performance, qualifying it as an enterprise-class virtualization product. However, this approach could couple customers to HDS's storage and can create the same type of vendor lock-in that they are trying to avoid, just at a different level of the architecture (i.e., the controller instead of the array.)

Advice from early adopters
From our research into storage virtualization, we have interviewed many early adopters of storage virtualization technology. Based on these interviews, four main themes emerged when we asked users what advice they would give to other users about to embark on storage virtualization projects.

Virtualization is not just for heterogeneous storage environments: A common misperception in the end-user community is that storage virtualization is only appropriate for data centers with multiple types of storage from different vendors. However, early adopters report that homogeneous storage shops (e.g., all-IBM, all-EMC, all-HDS) can realize tremendous value in storage virtualization, too. Users reported many of the same benefits in terms of online data migration, decreased storage provisioning times, and greater storage management efficiencies from having a single point to manage all types and classes of storage, even if it is only managing different models of the same vendor's product line.

Justify purchase based on quick payback usage models: A key discussion point in our conversations was how users thought about ROI and how they chose to justify storage virtualization projects to management. End users are strong believers in tiered storage, information lifecycle management (ILM), and improved storage management visions painted by storage virtualization vendors. However, when it comes to selling the project to their management, they advised basing an ROI case on online data migration and reduced storage provisioning times. End users believe ROI for these usage models is relatively easy to demonstrate to upper management and allowed them to invest in a critical piece of infrastructure that would improve the overall storage management of the infrastructure. Ultimately, they believe storage virtualization increases utilization rates and enables them to move to a tiered storage environment. However, the payback for those value propositions was farther out. For users looking to justify the purchase, we recommend building demonstrable payback and ROI models based on concrete, well-understood usage models such as online data migration and improved storage provisioning.

Invest in tools to give increased visibility: Many users that we spoke with bemoaned the difficulty in troubleshooting performance bottlenecks and ultimately getting clear visibility into exactly where data resided on physical disks. Virtualization compounds the already difficult task of troubleshooting performance bottlenecks since a volume can be striped across many arrays, trays, and individual disks. Determining the source of the bottleneck can be very challenging in a virtualized environment with existing tools. Users recommended investing in additional storage management tools, but were still not completely satisfied with the currently available products. This is clearly an area where vendors can improve further.

Change control is key: The switchover to a virtualized environment can be an arduous process. End users advise a gradual rollout of the technology. Users should plan to virtualize an array or set of non-critical volumes first, gain confidence with the technology, and then gradually migrate the entire environment in a series of piecemeal steps. Many users wished that they had implemented stricter change control processes at the host, network, and storage device levels because they encountered problems cutting over to a virtualized storage environment. Careful planning and a measured approach is a must in the transition to a virtualized SAN fabric.

Evaluating storage virtualization
Judging from the new developments in the storage virtualization space, the battle for the enterprise is on. Three of the major vendors—EMC, HDS, and IBM—have developed distinctive architectures for delivering virtualization services and are prepared to do battle. Our advice to end users evaluating these competing solutions is to assess the vendors on a number of dimensions.

Breadth of heterogeneous support: Does the vendor support all hosts and storage devices in your environment? Does the vendor favor its own storage from a support and service perspective? Does the proposed architecture effectively lock the end user into sourcing storage exclusively from the vendor?

Real-world customer deployments: Are all the components in the solution proven? If the components (e.g., switch and software) are coming from different vendors, what is the support interlock and problem resolution? Can the vendor provide reference customers that will testify to reliability, scalability, ROI savings, etc.?

Feature/function completeness: What support does the vendor provide for MAN and WAN disaster-recovery scenarios? What level of data-protection services does the vendor provide? What is the vendor's product road map, and what is its track record in delivering against that road map?

Support for phased deployment approach: Does the virtualization solution require an "all-or-nothing" approach to deployment? Does the solution allow users to virtualize a storage device, LUN by LUN, if necessary?

Demonstrated scalability: What is the scalability path for the solution as the number of hosts and targets grow? What is the scaling unit, and is it cost-effective for your environment? If the solution is in-band, pay particular attention to the scalability claims of the vendor.

Storage virtualization will be at the nexus of next-generation data centers. As such, it will be a highly contested and strategic control point. At this juncture, we see an epic battle brewing for who controls and gains leadership in this market. Each vendor is pioneering a different architectural approach to market, and the next 24 months should be a telling time in the development of the storage virtualization marketplace. It has been a long time coming, but the move toward intelligent SAN fabrics is upon us.

Steve Norall is a senior analyst and consultant at The Taneja Group consulting firm (www.tanejagroup.com).

This article was originally published on December 11, 2006