ILM: Separating hype from reality

Information life-cycle management has suffered from vendor hype, but by understanding ILM fundamentals you can realize operational benefits.

By Neil Murvin

Does information life-cycle management (ILM) solve a real problem? What does it encompass? How can companies benefit from ILM? And how do organizations cut through exaggerated vendor claims?

Everybody is talking about ILM, and seemingly every vendor offers ILM solutions. Users have to focus on their companies' problems and particular requirements. What situations do you face in storage, availability, and backup that need to be addressed? What inefficiencies exist that, if remedied, would make a major difference to the business? Unfortunately, unless you have a genuine problem to address, no amount of ILM software will offer any value. Fortunately, ILM is based on relatively sound logic and does solve some pressing problems in storage.

"Data is growing at 125% a year, yet up to 80% of this data remains inactive in production systems where it cripples performance," says Charlie Garry, senior program director at the META Group IT consulting firm.

"To compound this problem, many enterprises are in the midst of compliance initiatives that require the retention of more data for longer periods of time, as well as consolidation projects that result in significant data growth."

Companies are struggling to meet the storage compliance mandates of regulations such as HIPAA, Sarbanes-Oxley, SEC regulations, and various state requirements. These diverse laws require strict record keeping and auditable records that must be stored for specific time periods and under specific conditions. The demands of storing corporate e-mail alone, for example, could cause a tenfold expansion in storage needs over the next decade. Legacy architectures simply can't be expected to cope with the projected demands—hence, ILM.

What is ILM?

Though varying significantly in its description from vendor to vendor, ILM is basically a strategy for policy-based management of information that provides a single view into all information assets. It spans all types of platforms and aligns storage resources with the value of data to the business at any point in time.

"ILM encompasses cradle-to-grave management whereby you get the right information to the right device or media at the right time," says Steve Duplessie, an analyst with the Enterprise Storage Group (ESG).

Does this sound a bit like hierarchical storage management (HSM)? Yes, but HSM was historically one-dimensional. It typically involved one large server or mainframe and focused on objective measures like access frequency: If certain data hadn't been accessed in a specific amount of time, it was automatically moved to another type of media. ILM goes further, extending this concept across the network to cover the entire infrastructure and adding both subjective and objective criteria.

Click here to enlarge image

New regulations, for instance, largely negate the historic HSM time-stamp criteria. This philosophy was that if no one ever accessed a file, it would be either archived or deleted as it had indeterminate value. Such information often ended up off-site, stored in a tape vault. If you needed to retrieve it, an administrator had to physically rummage through the tapes in the hope of finding that one file in the tape vault haystack.

Today, however, that methodology is obsolete. Files that have not been accessed for years may now represent high value due to potential penalties that could be invoked if they have not been retained. Thus time-stamp, subjective, and legislation-based data categorizations are all incorporated into ILM. And regardless of where the data resides or the type of media employed, they can be managed from one console. Even with hundreds of applications, dozens of servers, terabytes of online and nearline data, and virtually unlimited offline archives, ILM is robust enough to cut through the complexity and manage an organization's information effectively.

However, the concept of ILM has suffered from vendor hype. Excesses of the past are firmly ingrained in users' minds, so it's not surprising that IT professionals distrust vendors' ILM claims. However, understanding the fundamentals of ILM and how they relate to one's business provides an opportunity to assess whether ILM can truly add operational value.

ILM fundamentals

Differentiate production data from reference data.—Production data includes files that are actively used in day-to-day operations within the organization. Reference data, on the other hand, includes files that are not frequently accessed but still need to be maintained and available, whether in support of internal organizational requirements or due to external legal obligations.

The likelihood of data re-use is directly related to age.—Access activity declines sharply over the first week following creation of a file; after one month, the information is rarely accessed (see figure).

Provide structured control over file retention and deletion decisions.—Let's say the example in the figure is a bank. Transactional data is kept online only during its highly active period—the first seven days—and is accessible in milliseconds. Between seven days and two months, the bank moves the data to nearline storage (i.e., the information is kept on lower-cost disk arrays, retaining the benefits of transactional speed and high throughput). After two months, the bank then archives the data to offline media or deletes the data as mandated under its policies and industry regulations.

Maintain compliance with mandated government regulations.—In the healthcare field, for example, organizations cannot just automatically migrate files or delete them based on a time-stamp. Current legislation calls for information to be retained and available for the duration of a patient's life and for several years afterward. Merely pushing data to an off-site tape archive is insufficient. This calls for a flexible system that encompasses broad criteria other than "creation date," such as "file type," "includes/excludes," "last access date," "modified date," etc. Ideally, such a system will provide "hooks" for real-time, continuous measuring of data access patterns so that policy-based migration can be adapted to "on-the-fly" administrator-supplied criteria.

Maximize information availability and data protection.—Servers, storage area networks (SANs), network-attached storage (NAS) devices, routers, switches, and other critical components that provide the conduit between users and data must employ sufficient redundancy to automatically fail over and recover from predictable faults to maintain information availability.

Traditional backup alone—frequently performed once-a-day overnight and often causing information to become unavailable to users—is no longer sufficient to preserve business continuity. Service-level objective adaptation, performance tuning, and retention control mean nothing if a disaster obliterates the unprotected information assets. We need to change the way we look at backup and redeployment, using snapshot and replication technologies (local and remote) to provide greater protection, while maintaining continuous information availability to users and applications.

Maintain application transparency of data for users.—Organization executives don't have time to be concerned about the location of data. They simply want to query the system and obtain a fast response. Therefore, it is mandatory to provide a logical view of data that allows users to "see" files as they were created and in directories where they originally resided, regardless of their current location on different tiers of storage.

Reclaim space on costly storage resources such as a SAN.—It is not uncommon for a business to spend over a half-million dollars on a SAN, so applications that minimize the initial need as well as future expansion are well worth considering. The objective is to maximize return on investment, which is accomplished by continuously monitoring and controlling information movement so that expensive SAN resources are occupied only by data that represent the highest possible value to the organization.

Neil Murvin is chief technology officer at CaminoSoft (www.caminosoft.com) in Westlake Village, CA.

This article was originally published on June 01, 2004