Q: ILM and HSM seem like the same thing. What’s the difference between them?
Your confusion is warranted. Information life-cycle management (ILM) has become a catch phrase for everything you can spend money on after you are tired of spending money on boring old storage. I guess SAN and NAS have stopped being interesting to talk about, so the storage industry has had to swim upstream and make the simple buckets of storage that we sell sound more exciting. Meanwhile, there is plenty of substance to many ILM solutions, and as the buzz mounts, vendors rush to re-label their products as ILM. In the stampede, the lines between product definitions become blurry, and the burden falls back on end users to define business needs and match them to technology.
So, yes, many so-called ILM products are just hierarchical storage management (HSM) products in disguise, but despite the similarities in the verbiage used to describe these technologies, there are conceptual differences worth noting. Let’s start with some simple definitions.
The main idea behind HSM is that you have two or more classes of storage devices, and as your files age (or for other reasons become less relevant on a daily basis) they are moved from the fastest and most-expensive storage devices down to the least-expensive storage devices. All of this is transparent to users with the exception perhaps of some latency in retrieving files that might be stored on removable media. The theory is that high-performance storage space is expensive, so why clutter it up with files that you are not actively using? Similarly, why overwhelm your backup system making it back up files over and over again despite the fact that they have not been touched in years?
The theory behind ILM is that you should have storage policies that relate to the life cycle of your information. As your information ages it has different storage requirements.
So, what’s the difference? The main difference is that HSM is about applying policies to the management of storage devices, whereas ILM is about applying policies to the management of the data as it pertains to storage. To a die-hard storage professional this is just semantics, so HSM and ILM sound like the same thing. To understand the difference you need to understand the conceptual difference between data and information.
When you think of storage from a data perspective, you don’t concern yourself with the actual content of the files or database records. A file is a file. A database record is a database record. Information might consist of a bunch of files or database records, but it is a more sophisticated way to think about your data.
Take a contract, for example. During the process of drafting and negotiating the contract there could be multiple files that need to be accessible to the team that is negotiating the contract. These files might consist of versions of the contracts or correspondence related to the contract negotiations. Individually, each file is a file. Taken together, they comprise something more sophisticated; they are information.
Once the contract is finalized, the final version needs to be locked down as a permanent record of what was agreed upon. The signed copies might need to be scanned and stored on write-once media as a permanent record. All of the old versions of the contract and correspondence are no longer needed on a daily basis, but you don’t want to delete them because you might need to refer to them later or be required by law to retain them for a certain period of time. You could copy them to CDs and put them in a vault, but if a dispute over the contract arises and you need to refer back to those documents it would be a pain to have to go rifling through the vault looking for a particular CD. In other words, you don’t want these documents cluttering up your workspace, but you don’t want them too far away either-at least not until some point in the future, when the contract terms have long since expired and/or you have satisfied the regulatory requirement for retaining the documents, so now you can delete this information or stash it offline.
So, how does a storage device know what’s contained in the documents or database records? The answer is that it doesn’t. Information is organized at the application level. In other words, ILM requires integration with your application software. As it turns out, there is a whole industry that does this: the document management and content management industry.
Traditionally, information life-cycle management has belonged to the document and content management industry. These vendors write application software that manages the process of creating documents, preserving versions, locking down fixed content, and deleting information that comes to the end of it’s life cycle. (By the way, I’ve been using a document management system for years and it is the most vital technology in my office!)
The lines between document/content management and storage have become blurry lately. First, you have EMC-a storage vendor-buying Documentum-a content management vendor. Then you have applications such as e-mail archiving being trumpeted by storage companies such as EMC, Veritas, CommVault, etc. Finally, you have storage technologies like write-once media and HSM, which could be back-end components in an ILM system.
The storage industry loves to take complex ideas and reduce them to three-letter acronyms and then ride the coattails of whatever trends the magazines write about. As always, the best course of action is to define your needs in business terms and try to reconcile them to technology, rather than being lured into a technology concept that creates a business need. HSM has its place. Just don’t confuse it with the business benefits of ILM.
Jacob Farmer is chief technology officer at Cambridge Computer (www.cambridgecom puter.com) in Waltham, MA. He can be contacted at firstname.lastname@example.org.