The case for Service Oriented Storage (SOS)

In the context of storage virtualization, end users should focus on the value of the solution, not on the underlying technology.

By John Webster

To date, storage virtualization has been a “promised land.” A number of vendors offer storage virtualization engines packaged in different wrappers. But, for various reasons, none have really caught fir in terms of end-user adoption. Absent compelling reasons to buy, we believe that storage virtualization will continue its steady progress forward, but in slow-motion mode.

Deploying storage virtualization in a functioning production computing environment may not be a trivial exercise. A deep understanding of the impact of such a change is an absolute prerequisite. It’s a case where the cost in terms of staff time required to fully understand the impact of virtualization and its presumed benefits is far greater than the cost of the actual hardware and software under consideration. It is also often the case that vendors are better at explaining storage virtualization’s underlying technology than they are at providing an understanding of its tangible benefits to end users.

What’s missing is a generalized understanding of the potential benefits of storage virtualization presented in a way that does not depend on a prior discussion of the underlying technology. People don’t buy cars because they have internal combustion engines, or because automobile makers have convinced them of the virtues of specific engine designs. They buy cars for the engine plus what is built around the engine. That is, they buy the whole car.

Too much of the storage virtualization hype generated by vendors focuses on the engine only. Does it work in-band, out-of-band, or something-in-between-band? Is it in the fabric, an array controller, or somewhere else? Is it host-based, SAN-based, NAS-based, or delivered via an appliance? Granted, there are different types of engines that work in different ways, and understanding the basic differences is an important issue to resolve, but these are secondary issues. The primary concern for buyers can be summarized in this way: What value does the total solution—the engine plus the rest of the car—offer to both IT operations staff and business application users?

Services + the engine

We believe that of primary importance to IT buyers will be the business and IT operational services built around the storage virtualization engine-services that can be deployed between heterogeneous server and OS environments above, and heterogeneous storage capacity below. Services in this context are defined as outcomes, with the real work performed by the virtualization engine. These services do in fact exist. They just haven’t been presented in the context of a unified whole.

Click here to enlarge image

The emerging Fabric Application Intelligence Standard (FAIS) offers a graphic example of what we mean by services built around a storage virtualization engine. FAIS defines a library of standard functions (services) that can be invoked—serially or in parallel—by an application or combination of applications. For example, a backup application that includes support for disk-to-disk backup can issue a “call” to a specific FAIS service that replicates data from one disk to another. At this point, FAIS will accomplish the data movement process under the control of the backup application. The FAIS replication service performs the operation-in this case, a disk-to-disk data copy—then sends a response back to the library. The library, in turn, signals the backup application that the request has been processed. Applications that require volume cloning, snapshots, data migration, and backup/restore can invoke the replication service to copy data from one storage device object to another. Storage device objects can be physical, such as a disk or tape drive, or logical, such as a volume.

FAIS also includes a volume management service that treats volumes as logical storage objects. Each volume has a volume address map or lookup table assigned to it, defining the logical-to-physical storage associations within the volume. This enables granular provisioning and utilization of capacity across heterogeneous device types (e.g., disk, tape, etc.). When used in conjunction with the replication service, an application can either copy or migrate data from one vendor’s storage device to another without having to use a proprietary, device-specific copy function.

SOA for storage

Service Oriented Architecture (SOA) is another IT services model that offers insight into what the whole virtualized storage package can offer. The basic objective of SOA is to create reusable or “componentized” services that can be invoked across multiple applications that need these common services.

SOA promises the following benefits that can be translated and applied to the storage domain:

Cost savings: Using centralized services reduces redundancy and the additional costs related to that redundancy.

Reduced time to deployment for new applications and automated processes: Business agility is enhanced when new applications can be brought online quickly by building them around reusable services modules.

Controlled, incremental advancement: Common services modules are brought online first and tested in a controlled fashion, with additional and possibly more-advanced services added later.

Standardization: SOA encourages IT departments to create standard implementations. This makes IT environments more predictable and processes easier to reproduce as needed.

Change management: SOA can potentially lessen the impact of required changes and upgrades to the online production applications environment since modularization tends to isolate disruption to specific modules rather than whole systems.

Lifecycle management: Evolution of systems becomes more manageable, and obsolete applications become easier to discard.

While no “philosophy of operation” is a silver bullet-SOA included-SOA’s core premise of defining services rather than implementations, and doing so in a modular way, is a common-sense approach.

What is compelling about applying the SOA model to storage virtualization is that each of the above-mentioned benefits can also be mapped to storage environments. However, to more fully appreciate the power of SOA for storage, an outline of those storage-related services must be drawn.

Heterogeneous storage services

We believe that you can define two levels of storage services—basic and advanced—with both sets powered by storage virtualization. Furthermore, the more-advanced services leverage the underlying common services.

The FAIS standard offers an example of what common services look like for block-oriented storage when the virtualization engine resides within the storage networking (SAN) fabric: disk management, volume management, and data copy services. Others may be added over time. However, FAIS is but one example. Other storage virtualization methodologies-file system virtualization for file-oriented storage (NAS) for example-can offer those plus additional common services. Regardless of the nature of the virtualization engine, a (not exhaustive) accounting of what we believe to be basic, heterogeneous storage services includes

  • Data copy (synchronous and asynchronous) from disk to disk and from disk to tape;
  • Non-disruptive data migration;
  • Volume management across heterogeneous storage devices;
  • Data encryption;
  • Data de-duplication (also referred to as data coalescence);
  • I/O load-balancing;
  • Data classification; and
  • File indexing.

Advanced storage services modules use the common services as building blocks, and it is these more-advanced modules that offer the most compelling outcomes of storage virtualization. Here are three examples of advanced storage services:

Tiered storage for ILM: Disk (primary) and tape (secondary) storage tiers are ubiquitous within corporate data centers. The emergence of Serial ATA (SATA) disk arrays allows storage administrators to create a new class of high-capacity but lower-performance secondary storage that is composed of disk, moving tape to a tertiary tier. A tiered storage management service based on SAN or block-based virtualization methods could operate across those tiers to perform a number of functions, while using some of the underlying common services. For example, selected data volumes or files can be migrated from primary to secondary storage, and subsequently to tertiary storage using the migration service. The volumes or files to be moved could be selected by the classification and indexing services. Also, migrated files could be significantly compressed in size as they are moved via de-duplication, maximizing the space and power efficiency of the secondary and tertiary storage tiers.

Data protection and business continuance: Backup and continuous data protection (CDP) services can leverage underlying copy and volume management services. Critical data sets that have stringent recovery-time objectives (identified via classification) can be copied from primary to secondary disk, and later to tape as needed. Other less-critical data sets can simply be backed up (copied) to tape. Data could also be encrypted (via an encryption service) as it is written to tape.

Federated search: Driven by the demands to identify certain e-mail and other sensitive electronic corporate records for regulatory compliance and litigation defense, storage administrators are now asking for the ability to perform wide-ranging searches on data stores. A search service operating across all storage tiers could be used to identify those files/records that have been classified and indexed. Once identified, they could be migrated to secure storage devices (e.g., content-addressed storage, or CAS).

Service Oriented Storage (SOS)

With an understanding of what basic and advanced storage services can offer, powered by a storage virtualization engine, you can acquire a better understanding of the value of the total package. Heterogeneous copy and migration services have the potential to dramatically reduce the current cost of redundant, proprietary data copy and migration functions tied to specific storage devices. This is true not only because more options and more competition drive prices down, but also more common facilities reduce the need for rare, super-specialist skills.

IT operational agility and predictability is enhanced when standardized storage and data management services can be applied quickly to new applications. A modular, services-based approach applied to storage architectural planning allows IT architects to test services modules first, and then bring them online in a controlled fashion. Finally, the implementation of virtualized storage services (e.g., non-disruptive data migration) can lessen the impact of required changes and upgrades to the online production applications environment. Modularization tends to isolate disruption to specific modules rather than whole systems.

No doubt, a change to an SOS environment will impact more than just the storage environment within the IT infrastructure domain and more than just storage administrators within the IT staff domain. Issues regarding data ownership and governance, management policy, and chargeback for resources consumed and services used will have to be addressed to varying degrees.

But taking a services approach to storage virtualization projects will make the justification of these projects based on ROI easier to construct and more compelling to business and senior management when approval is required.

SOS may not be a radical architectural proposition to the many who have already embraced SOA or service-oriented infrastructure (SOI). Nor is it a particularly new thought to the developers who have been actively noodling all the crazy and wonderful things they can build once the basic virtualization and shared management infrastructure is in place. But neither is it the common way that things are done-nor that infrastructure is conceived and constructed-in storage today.

As storage virtualization grows in acceptance, and as virtualization in the server realm grows by leaps and bounds, initial deployments aimed at solving particular problems (e.g., increasing utilization or easing volume migrations) evolve into more-systematic shared services. That is what we are advocating for storage: a common-sense evolution into storage delivered not as hunks of sheet metal or numerous vendor APIs, but as flexible, architected services.

Click here to enlarge image

John Webster is a principal IT advisor at the Illuminata research and consulting firm (www.illuminata.com).

This article was originally published on February 01, 2007