By Heidi Biggar
If you're wondering what IBM is up to with virtualization and "Storage Tank," you're not alone. Until recently, Big Blue has shared few details about its software strategy, leaving many to speculate about its plans, the nature of its virtualization product, and the fate of Storage Tank.
I recently spoke with Brian Truskowski, chief technology officer for IBM's recently formed storage systems group. In the course of the interview, Truskowski sets the record straight about the company's new Virtualization Engine and Storage Tank, its relationship with Hitachi Data Systems and DataCore Software, and more.
Q: How does IBM define virtualization?
Our definition of virtualization centers on block-level I/O, meaning better management of the physical storage that sits behind virtualized pools. Our Virtualization Engine takes disparate storage back-ends and pools those together to create virtual pools of storage that are relatively independent of the actual physical storage that sits behind it.
This reduces complexity. You don't have to reconfigure the application servers every time you want to change some storage. The application servers aren't tied directly to the physical storage on the back-end; they connect through the intermediate, or virtualization, layer.
Q: Today, your virtualization capability centers on DataCore Software's SANsymphony?
What we've announced with DataCore is a very specific set of solutions tied to our Shark product. It's a tactical solution... to [add functionality] to Shark. It allows us to do distance copy and to create larger LUNs. Strategically, the virtualization engine that we recently announced is what we are investing in internally and is what we consider to be our enterprise-wide virtualization solution.
Q: Why develop your own virtualization product?
There are a lot of point products out there that address bits and pieces of the problem, but we aren't convinced they are ready for the enterprise environment. We don't think existing products have the same availability [or performance] characteristics that we're proposing.
Q: You're taking an "in-band" approach?
Yes, it's an in-band solution based on IBM eServer xSeries and Linux [using a fault-tolerant clustered architecture]. Each node has 4GB of cache [up to eight nodes in pairs], so we have read-write cache in the network, which gives us a lot of capability for writing unique functions in the network layer where the virtualization lives (see figure).
We're convinced that in-band makes sense for virtualization. There is so much more we can do function-wise by having it in-band versus out-of-band. Also, there are ease-of-use advantages to an in-band approach. All you have to do is drop virtualization into your network. You don't have to make any changes to your application server, other than point it to your virtualization layer.
Q: Does the virtualization support heterogeneous storage?
Yes, over time. We're working with a number of vendors to get the right device drivers. We'll probably start out with a smaller set and extend that over time.
Q: Does Hitachi have any role in the development of this specific virtualization engine?
Right now, their role is as consumer. They've concluded that our implementation makes a lot of sense. At this point, it looks like they will use our technology.
Q: Are you co-developing the product?
It's not really a joint development, but they will use our technology to create their own implementation. I think they're still determining how best to integrate the software. There are a number of different ways we can go with them, and the details are still being worked out.
Q: What is "Storage Tank?"
Storage Tank is a SAN-wide file system for storage networks that is common across all application servers (see figure). Today, every operating system has its own file system. Users can manage the physical assets of each of these, but these assets are still grouped into "containers," or virtual pools, virtual LUNs, virtual volumes, etc.
All the data in these containers is unique to the application server. This makes management very difficult because there isn't a single namespace across all platforms. Every application server sees its piece of the storage and only its piece of the storageor its part of the file tree. As a result, there is no common point of management. Because every one of these environments is different, you have to have separate policies [e.g., backup and recovery] for every file system out there.
[But] what if you had one file system? You still have the native file systems on all the application servers, but instead of using that, what if all the data was written in a common way to storage and there was one place in the storage environment that understood all the data and wrote it out in a common fashion? That's what Storage Tank does. It's a file system for storage.
[In terms of hardware], it's a metadata server cluster composed of xServer xSeries running Linux, so it's very similar to the virtualization engine in terms of the clustered hardware. The cluster of servers sits off to the side of the storage network. And then there is a protocol and pieces of software that sit on each of the application servers.
Q: How does Storage Tank differ from the Virtualization Engine?
It is complementary to our virtualization strategywhether it is our Virtualization Engine or another vendor's. Storage Tank will work with those virtualization products, but it doesn't have to. However, we think virtualization will bring the same value to Storage Tank as it does to environments without Storage Tank.
Q: Does Storage Tank apply to NAS (network-attached storage) environments?
Yes. If you think about it, what is NAS data but a file system that's outboard of the application server? In my view, there are three ways of thinking about file systems: as local file systems, as NAS filers, or both. Storage Tank can be both.
It's a way to converge NAS and local file systems into one thing, called the Storage Tank. It provides a single file-system view, so you can see the entire file system across all application servers at onceyou can see the whole SAN domain.
at a glance
DescriptionEnterprise-class block-level virtualization (in-band approach).
HardwareIBM eServer xSeries running Linux in clustered configurations; 2-node minimum, scalable to eight nodes (initially).
SupportBroad OS support, limited storage support (initially).
BenefitsImproved storage administrator productivity, common platform for advanced functions (e.g., disaster recovery, point-in-time and peer-to-peer copy, data migration), and improved capacity utilization.
AvailabilityGeneral availability slated for 2003.
DescriptionSAN-wide file system (file aggregation).
HardwarexServer xSeries running Linux.
SupportAIX, Solaris, HP-UX, Linux, and Windows 2000/XP.
BenefitsHeterogeneous file sharing, centralized management, and improved storage utilization. Also being designed to provide policy-based capabilities (e.g., provisioning and non-disruptive data migration).
AvailabilityIn alpha testing; general availability slated for 2003.