A successful ILM strategy encompasses four segments: data classification, data policy, data management, and a tiered storage infrastructure.
By VS Joshi
Every few years, IT vendors inundate CIOs and top management with new initiatives that they believe businesses must undertake as soon as possible for their survival. Consolidation, Y2K, ERP, CRM, grid computing, e-business, on-demand computing, virtualization, etc.: You get the picture. The latest buzzphrase is information life-cycle management (ILM) or its cousin, data life-cycle management (DLM). Most storage vendors have jumped on this bandwagon and are touting themselves as ILM companies. Some start-ups, after realizing that ILM is not a product, have changed their positioning statements and are now calling themselves “ILM-enabling companies.”
What is ILM?
Vendors use various definitions for ILM depending on their product sets. However, the storage industry, assisted by the Storage Networking Industry Association’s Data Management Forum (SNIA DMF), is moving toward a general consensus on the definition of ILM (see “SNIA road map sets course for ILM,” InfoStor, November 2004, p. 38). Here are some of the key takeaways:
ILM is a strategy, not a specific product. It is a new approach to managing information that comprises policies, processes, practices, and tools used to align the business value of information with the most appropriate and cost-effective infrastructure from the time information is created through its final disposition.
In short, ILM is a strategy by which storage resources are allocated depending on the business “value” of the data stored on them. This value changes throughout the life cycle of the data, thereby affecting the way in which resources get allocated.
Go to the library
To understand ILM, consider an analogy: the information management system in a public library. A library receives “data” in many different forms-newspapers, magazines, books, etc. Consider the life cycle of a particular form of data-a local newspaper.
Every morning, the librarian places one copy of the day’s newspaper on an easily accessible newspaper stand (storage tier 1 in our analogy) and sends another copy to the company that converts it into microfilm. After a day, the copy on the front stand is moved to a weekly rack (storage tier 2) reserved for newspapers of the preceding week. After a week in storage tier 2, the paper moves to a monthly container (storage tier 3). After a month, the newspaper goes to a yearly container (storage tier 4). If you need a newspaper that’s older than a year, you may get it only on microfilm (storage tier 5) as the library may have a policy of discarding older newspapers to recover precious real (or rack) estate. Thus, the newspaper (data) location changes repeatedly based on its changing value.
For a different form of data, say a magazine, the storage racks and workflow are different. Thus, public libraries treat various classes of data differently and allocate different resources for different classes of data. Within a particular class of data (e.g., newspaper) the allocated resources change according to the changing value of the data.
The various information management stages can be categorized into data classification, data policy, data management, and tiered infrastructure.
In this segment, incoming data is classified merely by observation. A person determines the class of the data (newspaper, magazine, etc.). Data is further classified by its value and relevancy. Value of the data changes regularly based on its age. Local newspapers are higher in relevancy than national newspapers, so classification also occurs on the basis of data relevancy.
In this segment the changing value of data is taken into consideration and a set of policies is applied. In the library example, these include policies specifying the
- Time a given newspaper (data) remains in a particular location;
- Flow of the newspaper within the library;
- Number of copies of newspapers the library must acquire; and
- Amount of time hardcopies are to be retained before they are destroyed to recover shelf space.
These are the physical processes for moving the newspaper from one storage tier to another, discarding hard copies after a certain period, retaining certain copies or items of historical significance, sending copies to microfilm, organizing microfilm, etc.
Any physical medium that is used for data storage and retrieval is part of a tiered infrastructure (e.g., newspaper stands, racks, microfilm readers, etc.).
These basic information management principles can be applied to a business enterprise. For an IT organization to fully implement ILM, the scope of its ILM activities should traverse the same four segments in the library example (see table on p. 37).
Data classification-Usually, enterprise data falls into three broad categories: structured (e.g., databases), semi-structured (e-mail), and unstructured (Word, Excel, mpeg, etc., files). Within these broad categories, not all data is created equal. In an enterprise, certain applications can be considered mission-critical, or gold-level, applications, whereas others are business-critical, or silver-level, applications, and some are operation-critical, or bronze-level, applications.
Data within each level can be further classified based on its importance. For example, an order processing application can be considered a gold-level application. Within the order processing application, data pertaining to orders of the current quarter is more important than data pertaining to the orders of the previous quarter or year. Data can also be classified based on its access frequency. The data classification scheme should be dynamic and take into consideration the changing value of data.
Data policy-Data policies should incorporate rules that define the underlying business processes. Architects of these policies should understand the challenges, priorities, goals, and mission of the business. They should also have a good understanding of business processes, business workflows, information flow, and data-retention requirements (e.g., regulatory compliance).
People to be included in the policy architect group can span a broad cross section, including corporate legal counsel, chief financial officer, chief information officer, human resources officers, compliance officers, etc.
The policies this group defines must be business-related and not technology-related, reflecting the nature and character of the business. Some rudimentary policies may be based upon data type, age, ownership, access history, etc.
Data management-This segment incorporates the technologies that enable the actual placement, migration, replication, backup, archiving, and deletion of data based upon the policies defined in the data policy segment. Eventually, ILM will automate all of these actions based on policies across the heterogeneous environment. This encompasses automated policy-based data movement between various tiers of storage in an enterprise, and automated policy-based protection and archiving of data. Among many other things, this segment includes automation tools that protect data according to business requirements and that automatically delete data when it is not required.
Tiered infrastructure-A tiered storage infrastructure can include high-end disk arrays, modular arrays, Serial ATA (SATA) subsystems, tape libraries, content-addressable storage (CAS) systems, archiving platforms, and WORM tapes. As storage is often one of the largest components of the overall IT infrastructure, its effective utilization is critical to reducing total cost of ownership (TCO). That means enterprises should store their data in storage tiers based on the data’s relative importance, reserving high-end arrays only for the most important data. Structuring data in storage tiers reduces TCO as well as the complexity of the environment. When data is organized by importance and usage, it also makes for a better disaster-recovery plan.
Some compliance regulations require the use of a specific technology, such as WORM. And in many cases archiving data is a requirement. Not doing so exposes enterprises to steep fines, bad publicity, and litigation.
Of the four ILM segments, data classification and data policy have little to do with storage technology and more with business processes, workflows, and cost and profitability concerns. These segments also provide policy engines and rules to ensure compliance in relation to the availability, accessibility, and recoverability of information. The data management and tiered infrastructure segments, on the other hand, are clearly the domain of storage professionals. This means that the real power of ILM can be harnessed only when people from both the business and the technology side work together.
ILM is incomplete without the adoption of all four segments. Hence, the path to complete ILM implementations requires IT to work in relation to business objectives and priorities, effectively reducing cost and complexity and increasing availability and responsiveness of IT environments.
Having different tiers of storage can be a good start, but to do it without connection to the other three segments leads to an incomplete ILM strategy.
Vendors that provide software for backup, replication, data migration, etc., and/or have hardware products for different storage tiers have all started calling themselves ILM vendors since last year. This should come as no surprise because customer adoption of ILM helps these vendors sell more storage software and hardware. These vendors do have the components needed to implement ILM, but the IT paradigm shift is not at that level.
The fruits of ILM will be seen only when enterprises can classify the data, apply policies on the data, and automatically move data from one infrastructure tier to another based on policies defined by financial, operational, business, legal, and compliance groups. These requirements/policies will be different for different vertical industries.
Some vendors may even sell vertical industry modules for the combined data classification and data policy segments. In an automated ILM scenario these modules should integrate seamlessly with the data management and tiered infrastructure segments.
No one vendor offers a complete end-to-end solution that covers all of the four ILM segments. In the next article in this series we will cover various vendors and how their products map to the ILM segments.
VS Joshi is an independent storage analyst. He can be contacted at firstname.lastname@example.org.
Four segments of ILM
- Structured, unstructured, semi-structured
- Gold, silver, and bronze applications
- Very important, important, and not-so-important data
- Business processes, business workflows, business costs and profits
- Industry legislations and compliance regulations
- Automated placement, migration, retention, backup, deletion, and archiving of data
- High-end storage arrays, modular arrays, SATA, tape, WORM
- Content-addressable storage, tape libraries