Take a top-down approach to storage management

Posted on May 01, 2004

RssImageAltText

Storage management should take its cues from asset management.

By Joseph Martins

Today's storage management solutions offer an array of advanced features intended to give users unprecedented control over their storage environments. But that control comes with a price: Without an accurate business-eye view of information, even the most experienced IT managers can inadvertently wreak havoc on corporate information assets.

At issue is storage management's rapidly expanding role in the enterprise—an expansion that pits storage management against other enterprise applications (especially content management systems) in a battle for control over corporate information.

Many storage management products permit IT users to operate below radar and beyond the visibility of modern content management systems.

As a result, users can move, replicate, back up, restore, re-organize, and archive bits, bytes, files, and blocks of data without necessarily understanding or addressing the business-level impact on information. The unintended and potentially costly consequences of their actions can go virtually undetected for days, months, or years—resurfacing at the most inopportune moments.

To minimize the risk, IT managers must understand the nature of information management and its relationship with storage management.

Also, they must understand the limitations of modern storage management practices to make the best possible use of existing products while working with vendors to champion a new generation of tools that overcome these limitations.

The communication gap

The intelligence necessary for proper data and information management exists outside the boundary of traditional storage purview, in the realm of higher-level enterprise applications such as content management systems.

Long ago, content management practitioners recognized the need to gather and manage extensive intelligence, in the form of metadata, about the information assets in their repositories. The metadata was then used to identify, interrelate, and manage those assets in ways previously unimagined.

The metadata used by modern content management systems far exceeds the quality and quantity of metadata familiar to storage management applications. As a result, many storage management products must rely on a more simplistic "discovery process" to glean basic intelligence such as a file's name, creation date, last edit date, last access date, access patterns, author name, and application. But proper, nondestructive data and information management cannot be successfully accomplished in today's enterprise environment using that intelligence alone.

Information can be created and managed in ways that transcend the capabilities of today's storage management products. Out of hundreds of products, only a handful begin to address the nature and complexity of information.

Many storage management vendors have adopted a "hands-off" attitude about information. Most seem to think that the management of bits, bytes, files, and blocks of data can be done in near isolation with no harm to information assets. If storage management were an island, perhaps that would be true, but it isn't.

Low-level data management activities performed in a vacuum can have unintended consequences on higher-level information assets.

The following are aspects of information that can throw a wrench into most data-management processes (see "Things to consider about your data," p. 42):

Classification

Many storage management products rely heavily on the manual classification of unstructured and semi-structured information assets into broad categories (e.g., by application, department or group, user, and general purpose). This level of classification is redundant. In many cases, a more granular, intelligent classification scheme is already available just by querying upper-layer asset management systems. If the goal is storage optimization based on accurate data classification (i.e., "the right data at the right place at the right time"), then the metadata gleaned from today's file systems is simply no match for the metadata available from an asset management system.

Ownership and digital rights

In the absence of communication between asset management and storage management systems, there is an enormous risk for inadvertent (or intentional) unauthorized duplication—a potential copyright violation that carries with it penalties of up to five years in prison and $500,000 in fines. An asset can be inadvertently replicated numerous times in the storage layer without higher-level applications even being aware of the unauthorized copies. In networks where administrators enjoy super user privileges this is particularly troublesome since there is no way to monitor and enforce subsequent access and usage to these copies.

Chronology

In the world of asset management, "age" is relative. Asset management systems can track the life cycle of thousands of interrelated information assets; how-ever, there are times when certain existing dependencies, policies, or regulations preclude the deletion or migration of an asset. What might appear (to a storage management system) to be an "old" or infrequently accessed file—one that could be safely deleted or moved to an archive—may in fact have dozens of dependencies of which the system is unaware.

Versions, renditions, translAtions

Conventional file systems and unstructured and semi-structured information assets lack the metadata necessary to enable storage applications to connect the dots between an asset and its many versions, renditions, and translations. Today's content management systems (and other asset management solutions) track this information but share none of it with the storage application layer.

  • Versions—multiple versions of a single asset can be created throughout and beyond its production cycle. In some cases, the older versions are overwritten or discarded. In other cases, older versions are retained for future reference. Some content management systems enable users to create dozens of versions and sub-versions of a single asset. And outside of these systems, version management can differ from one department or user to the next.
  • Renditions/translations—a single asset can be converted into any number of languages, presentation layouts, and file formats. For example, a technical manual written in English can be translated into 20 languages, styled according to regional and cultural guidelines, and rendered in multiple file formats for multiple distribution channels. The resulting content can be distributed across the enterprise. Without adequate intelligence, storage management products cannot identify, relate, and manage these assets.

Construction and relations

As if managing individual information assets is not complicated enough, in today's enterprise there is virtually no such thing as a stand-alone asset. An asset can be constructed in a number of different ways.

One example of a complex asset is a Microsoft Word master document and its related, yet completely separate, subdocuments, each representing an individual file. Other examples are an Adobe Premiere image sequence consisting of hundreds of individual images, an Adobe InDesign or Quark Xpress page layout containing links to multiple source files, and an Adobe After Effects video composition comprising several separate footage items (e.g., graphics, audio clips, and video clips).

Here again, many asset management systems can track and manage the master asset as well as its components (i.e., other assets). In fact, subcomponents can be shared across several master assets at any given time. For example, three different unrelated After Effects projects and a Macromedia Flash project could share the same instance of an audio clip.

Modern enterprise is rife with inter-asset relationships: technical product development, drug development, legal proceedings, newspaper publishing, magazine publishing, studio animation, and Web publishing involve dozens (if not hundreds or thousands) of interdependencies between information assets.

In fact, in the 1990s, Websites introduced an unprecedented level of inter-asset dependency. While most users are familiar with Web pages and multimedia clips, few realize that today's Websites also employ style sheets, document-type definitions, specific database releases, database schemas, scripting engines, client-side code, and server-side code. Moving, replicating, or archiving Websites is not a simple matter of scraping the files from a file server. Web assets may be spread out across dozens of servers and across multiple environments, and change daily, weekly, or monthly.

Without adequate information intelligence, storage management applications are unlikely to understand what is required to back up, replicate, and archive the bits, bytes, files, and blocks that contain or comprise complex information assets.

Shared policy is the only policy

Data retention and "life-cycle management" solutions all boast the ability to ingest, manage, move, and remove information assets in accordance with user-defined policies. The question is, "Which policies rule: those of the asset management environment, those of the storage management environment, or both?" For many storage management products, the answer is unpleasant, to say the least.

For those of us already using business-level policy engines of the type bundled with asset management systems, it means the administration of two (or more) totally separate, competing policy engines—one up in the information layer with an excellent business-eye view of corporate information assets and another flying virtually blind down in the storage layer. Without communication between the different policy engines, the management of even a few thousand assets can be risk-laden, time-consuming, and costly. Just imagine the impact on an environment of tens or hundreds of thousands of assets.

Data and information management should ease, not increase, the burden on already-taxed administrators. Storage management vendors should quit reinventing the wheel. The metadata their products generate is a tiny subset of the metadata available to modern asset management systems. There's plenty of metadata available if they know where to look for it.

Looking ahead

People, processes, and policies should drive information and data management from the top down, not the bottom up. While today's storage management systems include comprehensive storage-layer capabilities well beyond the basic utilities found in most asset management systems, they lack visibility up into the asset management layer, which makes it virtually impossible for the applications to properly manage data and information within the storage layer.

Over the next two to three years, we believe we'll see an increase in cooperation and metadata sharing between asset management vendors and storage vendors. This may lead to a number of acquisitions as the two environments are drawn closer together. Ultimately, we believe storage vendors must take it one step further and abstract the storage environment to create a common, vendor-agnostic content repository and infrastructure.

The bottom line is that before you unleash a storage management product in your network, you need to invest the time to understand the limitations of the product with respect to data and information assets and how the product will fit in your environment (see "Questions you should ask," left). Fortunately, a handful of vendors currently offer storage management systems designed to complement, not compete with, asset management investments.

Click here to enlarge image

Joseph Martins is the client services director at the Data Mobility Group consulting firm (www.datamobilitygroup.com) in Nashua, NH.


Questions you should ask

Below is a partial list of questions everyone should include in requests for information as he/she explores his or her storage management options:

  • Which metadata do you use to drive your policy engine and how is it obtained?
  • Have you integrated your product with any enterprise asset management environments? If so, which ones, and to what extent? Does the integration include an interface to communicate and collaborate between the policy engines?
  • What sort of classification/categorization scheme does your product provide? Can you adopt the classification system used by your asset management environment?
  • Have you integrated with any third-party categorization engines? If so, which ones and to what extent?
  • How does your product locate, identify, and relate asset versions, renditions, and translations?
  • How does it locate, identify, and relate asset copies?
  • How does it enforce unauthorized replication or retention (especially in the presence of an asset management or digital rights environment)?
  • How does it handle complex assets to ensure that components are never lost or mismanaged?
  • How does it handle interdependent assets to ensure that important relationships are never lost?


Things to consider about your data

Classification—What is it?
Ownership and rights—Who owns it? Who is allowed to access/use it? And for how long?
Chronology—When was it created? Where is it in its life cycle?
Version—What state is it in? For what purpose?
Rendition—In what form?
Translation—In which language?
Construction and relation—Is it a master document, a sub document, a resource, or all three? In what ways is it related to other assets?

Originally published on .

Comment and Contribute
(Maximum characters: 1200). You have
characters left.

InfoStor Article Categories:

SAN - Storage Area Network   Disk Arrays
NAS - Network Attached Storage   Storage Blogs
Storage Management   Archived Issues
Backup and Recovery   Data Storage Archives