How to plan for information lifecycle management

Is ILM simply a repackaging of old methods with a glossy veneer of vaporware, or can you actually improve your operational costs and efficiencies?

By Dick Benton

Despite hardware and software vendors’ tendency to redefine information lifecycle management (ILM) in terms of their own products, the reality is that ILM is simply a philosophy of ensuring the storage infrastructure is periodically aligned with-and thus relative to-the actual business value of the data. Some vendors and analysts imply that this alignment is expected to be automatically initiated and maintained, driven by policy-based workflow supported by the ability to ascertain change in data value.

The ILM philosophy of using data value to determine the most cost-effective storage resource can be achieved in a reference architecture that provides tiers of service of differing capabilities (and differing costs) delivered under service level agreements-the internal service provider model. In fact, this is how the Storage Networking Industry Association (SNIA) currently defines ILM.

In the service provider model, each tier of service is supported by a technology architecture characterized by specific service attributes. The variance in delivery capabilities provided by those attributes drives differing costs for each tier of storage.

In considering the costs and benefits of ILM, you must first understand the business drivers for ILM. Are these drivers truly business unit initiated, or are they actually driven by a need for further internal IT efficiencies? Are ILM policies simply a means of achieving compliance, or are they really part of a larger data governance strategy? Unless there is a clear understanding of the rationale for ILM, there will not be a basis for the assessment and selection of a suitable reference architecture (hardware and software). There is a danger that the vendors may drive the solution from the hardware end.

In preparing for ILM, several questions must be answered through cross-functional discussions. How will data movement from tier to tier be triggered? What is the difference between data life-cycle management (DLM) and ILM? How is the value of data calculated? When is the delta in value sufficient to trigger a migration? Are different strategies needed for structured data and unstructured data? Is ILM application-specific? What is the impact of ILM-driven data migration on recovery capabilities? Will the cost of administering and controlling the ILM environment chew up the projected savings? What policies and treatment will be applied to the final phase of a lifecycle-the purging of data?

The answers to many of these questions are interrelated, and CIOs will need clarification in each of these areas before selecting an appropriate ILM strategy. Let’s briefly touch on some of the more contentious issues.

Are you managing data or information?

What is the difference between data and information, and how does it impact lifecycle management? Data generally needs rendering by an application before the result can be utilized as information (information is data that has been rendered by an application into a human actionable form).

Today, it is probable that most lifecycle management implementations are actually DLM rather than ILM. DLM entails managing the data files that applications use to provide information. There are circumstances where a phase of the lifecycle of data may come under various legislative requirements that dictate that the data cannot be rendered by an application. This means it may need to be kept as information, not data. This requirement may trigger a need for XML and perhaps .pdf formats. Even if compliance does not dictate the issue, retrieval of archived data may not be feasible in the longer term unless it is archived as information-a challenging requirement if this capability is required, say, 5 to 10 years after the data has been originally written. The unrealistic alternative is to retain the multiple technologies required to render the data into information. These can include tape and disk hardware, servers, operating system versions, device driver versions, and application sources and objects.

Determine the value of data

Is it possible to take the broadest definition of value and use subjective measures such as “importance?” If movement of data through its lifecycle is to be automated and based on change in value, then value needs to have some empirical base. This can include business impact analysis, corporate profit projections, three-year business line projections, etc. All of these can be used to determine a static value of data, perhaps more accurately reflected as a value of the application.

To meet the fullest definition of ILM, the empirical base for data valuation needs to be capable of detecting changes in current “value” to trigger migration to the appropriate storage tier. But how can you have this static value of data respond to and reflect change in underlying assumptions? It seems that the value needs to be directly related to some dynamically changing entity, perhaps the number of transactions or average value of transactions. How realistic is it to set up metrics and monitoring to determine such calculations for each application or for each business unit’s data? The ability to develop a data value mechanism that can dynamically respond to business change in order to trigger policy-based migration is, to say the least, challenging.

Certainly you can strike a value of data at a point in time and have this value reasonably reflect reality. In addition, subjective issues such as customer satisfaction can be included, perhaps by allocating a percentage impact. But is it realistic or cost-effective to set up this procedure for every business unit and its applications to meet the ultimate ILM goals of policy-driven data migration based on value of data?

What should trigger migration?

It is important to understand that data may just as easily increase in value and migrate up storage tiers as it may decrease in value and migrate down through the tiers. Data warehousing, for example, can over time build to a critical mass that continuously increases the value of data in business analytics. Similarly, research data on medicine becomes more important the closer the commercialization phase becomes.

Let’s assume you have been able to calculate and subsequently maintain the value of data over time. Now you need to ask what delta in change would be sufficiently significant to trigger migration to a more appropriate tier of storage. This is perhaps one of the few easy ILM questions to answer. We have already learned how to do this when we set up the service provider model.

A service provider model groups business needs into appropriate tiers of service, each with its own cost model. The big question is, “How often do you expect a data value change of magnitude to occur? Will it happen daily, weekly, monthly, quarterly, or less frequently?” One might conclude that frequent significant variations in value (i.e., changes occurring on a daily or weekly basis) might have a dramatic impact on availability, because data would then need to migrate at that frequency. Perhaps there are ways to minimize production impact, but at a cost and complexity that may be unacceptable. If the policy delta occurs less frequently, then what is the justification for investment in automation, its attendant impact on complexity, and the resulting impact on administrative costs?

It is difficult to value data, even more difficult to develop a means of continuous re-valuation, and even more difficult to use valuation changes in an automated migration process. It may be more efficient (and more reflective of reality) to implement a calendar program for data re-valuation at defined periods, perhaps quarterly or as identified by the business units.

The concept of an automated process detecting value change and triggering migration does not seem feasible at this time. It is feasible to monitor other attributes and trigger change on transaction entities such as date or date last accessed, but these attributes are not what we might typically think of when the word “value” is used.

Content-based lifecycle management

Given that data value cannot be dynamically maintained to trigger automated migration, it is certainly possible for ILM migration to be triggered by data content criteria such as transaction date as well as by standardized metadata content commonly held in, say, Microsoft Office file structures-perhaps by simply using classic hierarchical storage management (HSM).

If these functions are to be included under the umbrella of ILM strategies, we need to be cognizant of the attributes that will trigger migration as well as the type of data in which these attributes might be found, whether it be structured (database), unstructured (non-database), or content-only (information, not data).

This article does not delve into classic HSM strategies except to say that it may be prudent to look at the ancestry of any vendor’s ILM solution to see if it is not simply a data attribute-triggered archiving solution that has been around for some time. More recently we are seeing a new class of middleware that provides the ability to unload transactions and tables from a database and still retain access to the contents in a manner transparent to the application. This approach, which enables archive-aware applications, is also commonly seen in the various solutions provided to reduce the production storage for Microsoft Exchange. As database sizes impact performance, such ILM strategies are expected to become increasingly popular. ILM products that focus on MS Office files also use Microsoft metadata to drive intelligent archiving strategies.

So what is actually doable today? A pragmatic approach to ILM includes agreed-upon definitions, policies to define what should be moved or copied, an appropriate tiered service model, and an agreement on what attribute changes will trigger migration up or down the storage tiers.

Today, for data lacking a metadata component the only realistic attributes that can be used to drive automated migration of data from one tier to another are either a business unit decision, the use of an application field such as date, or an application calculation such as age.

Where metadata exists, it is possible to be as sophisticated and granular as the metadata permits. In classically structured data you are left with few options beyond choosing one or more specific data fields and analyzing their content.

Dick Benton is a principal consultant with GlassHouse Technologies (www.glasshouse.com) in Framingham, MA.

Key steps in implementing an ILM strategy

1) Use organizational experience to identify three-to-five significant groups of data, which if subject to ILM management, might be expected to provide significant savings or efficiencies.

2) Determine a cost model that can show savings that will be delivered by moving data from point A to point B on condition X. This allows your infrastructure investment to be developed based on a rational cost-saving basis.

3) Determine your strategic ILM policy. What types of data will move, what will trigger the move, and where will the data move to, based on the attributes?

4) Model the frequency of value change meeting the delta and the data migration time to the target tier. Include any impact on the production environment of the migration. This will provide a basis for the degree of automation possible and even a justification for investment in automation.

5) Develop the definition of attributes that would support three to five tiers of service. These attributes will include not only production and protection attributes, but also the value deltas and targets for ILM movement.

6) Construct a cost model for each tier of service, including hardware, software, and administration costs for the life of the data.

7) Develop standard operating procedures with associated compliance, completion, and quality metrics to discipline the operation and place it on a highly repeatable basis, whether it be automated or manually triggered.

8) Negotiate with the business units so that they can select the appropriate tier of service for their business unit’s needs.

But above all else, do not let your vendors define ILM on your behalf. Know what you want, why you want it, what it will cost, and what it will save you.

This article was originally published on February 01, 2006