ILM requires data classification

A complete information life-cycle management strategy should include integrated automation, policy creation, discovery, and data classification.

By Heidi Biggar

Many ILM-labeled products are on the market today, but most of them lack a key ingredient-data classification, or the ability to classify or categorize data according to various criteria based on subjective or objective measures as opposed to just the age or type of file.

Data classification allows users to set up different groups of data, to which appropriate policies can then be applied. Doing so has potentially significant benefits: If you think your existing software management tools (e.g., HSM or SRM) have helped you trim resources, just wait and see what classification can do to your bottom line. It can also help with regulatory and security requirements.

End users are being pounded with ILM messages from virtually all storage vendors-hardware and software alike. However, many users have implemented ILM “strategies” that amount to little more than HSM (moving data to lower-cost storage tiers) or SRM.

Although these types of implementations do provide value, the potential benefits of a complete ILM strategy are more far-reaching. In particular, ILM can help organizations make better use of storage resources (e.g., improve utilization, provisioning, etc.); reduce storage-related costs; improve backup efficiency; minimize application downtime; consolidate storage resources; better meet regulatory compliance, corporate governance, and security requirements through better management of data; and lower overall IT costs, including management.

The value of an ILM infrastructure lies in its ability to treat data, or information, according to its changing business value. Data in an ILM environment is not treated equally. It is not arbitrarily moved from storage resource to storage resource, nor is it necessarily moved in “bulk” (i.e., a single policy isn’t applied to all data). Data that is deemed mission-critical (high business value) is treated differently from data that is deemed less critical.

Ultimately, an ILM infrastructure will continually assess data value and transparently re-assign resources in a tiered fashion as dictated by adaptive policies.

The number of storage tiers companies implement depends on the specific business demands of their organizations and on available IT and corporate resources. Storage tiers can include primary disk arrays, secondary disk storage, virtual tape libraries (VTLs), online disk archives (e.g., content-addressed storage), and tape.

Enterprise Strategy Group (ESG) research shows an increasing trend among organizations of all sizes to implement disk-based data-protection tiers to improve backup-and-recovery efficiency and overall disaster-recovery preparedness. At the other end of the spectrum, users cite the high costs of primary storage as a strong impetus for implementing SATA-based secondary storage tiers.

As one end user says, “A growing problem with our snapshot solution is that it’s just too expensive to keep the snapshots on our high-end storage. We’d like to move those volumes to a midrange product or cheap ATA disk.” Another end user points to data-retention issues that were affecting backup-and-recovery strategies. “Going forward, we really only want to use tape for disaster-recovery purposes. We’ll address data retention with cheaper, more readily accessible disk technologies.”

But ILM is about more than just the movement of data among storage tiers. It’s about being able to discover and extract the business value of data; categorize or classify data types; and set policies that transparently move data among available resources in a way that makes optimum business sense. In other words, it’s about being able to classify, migrate, and investigate.

While many vendors today tackle one aspect of the ILM process (e.g., discovery via SRM or data movement via HSM), few offer integrated product suites that tackle all three. (One exception is Arkivio’s ILM suite.) In particular, few products today are able to categorize or classify data in a way that allows end users to establish flexible, granular data groups.

Data classification can help organizations make the best use of their IT resources and extract maximum business value from their data.

Rather than dumping all data into a large funnel and applying generic global policies to a single data pool, classification software sorts data at a more granular level and then applies policies to the data based on the specific needs of a particular group or department.

ILM suites with data classification not only let administrators create data groups that span multiple volumes on heterogeneous servers and storage devices, but also allow them to differentiate within these groups by establishing data classes based on the age/type/size of file, owner, or path of the data. Data is directed to the appropriate class, or tier, of back-end storage based on this information.

Like the storage groups, the storage classes also need to span different heterogeneous storage devices (e.g., primary and secondary storage tiers) and the process should be automatic. For example, IT departments should be able to implement the most-cost-effective storage platforms without having to create new data movement policies.

For example, if SATA has been designated as a secondary storage tier, the end user should be able to swap out technology (regardless of the manufacturer or type of storage) without having to create new data movement policies. The classification system should be able to adapt to the new technologies and move data appropriately among data groups.

As for regulatory or corporate compliance, organizations can use ILM with data classification tools to establish multiple data groups and then apply corporate or regulatory policies to all or some of them. Similarly, they can define which data groups need to be encrypted for security purposes and which don’t. No more blanket encrypting. Policy management is fluid, allowing users to start with simple actions but scale them over time. For example, users can write specific policies around financial data that can exclude certain types of data (e.g., quarterly financials) from moving to secondary storage tiers regardless of the age of the data or its access frequency. This differs from traditional HSM software, which moves data among tiers based on the age of the data.

Early adopters report significant application performance improvements as a result of their ILM implementations, improved recovery times, and improved resource utilization.

Some ILM suites can be used alone or in combination with e-mail archiving, content management, or other applications that lack data classification capabilities to help these applications run more efficiently. In these situations, ILM would classify and sort the application data according to pre-defined policies and move the data to appropriate storage classes, while the e-mail archiving or content management software would deal directly with the primary application.

Click here to enlarge image

ILM should cover the full spectrum of discovery, classification, automation, and policy creation. ESG research has shown that users are interested in purchasing storage software as bundled solutions. Users also indicate growing interest in purchasing integrated product suites that share a common interface, database, and policy engine (see figure).

ILM in its truest form provides many benefits for companies of all sizes. But being able to realize these benefits will require users to implement storage software products that do more than just move data from point A to point B.

Users need to implement a data classification product that will use more than the age of the data to help determine its value to the organization.

Click here to enlarge image

Heidi Biggar is an analyst at the Enterprise Strategy Group (www.enterprisestrategygroup.com).

This article was originally published on November 01, 2005