Persistent data, part 2

By Heidi Biggar

I appear to have struck a chord with one of my previous columns (see InfoStor, "IT’s-new-‘dirty-little-secret", October 2007, p. 22). A number of InfoStor readers reached out to me and shared their thoughts and anecdotes. I encourage you to continue to do so.

After digesting your input, a few things stand out:

The scale of the persistent data problem is grossly underestimated and often not considered a high priority by IT departments. I attribute this to several factors: 1) the lack of available tools to help organizations distinguish between persistent data and “regular” primary data; 2) hesitancy among IT departments, as one reader put it, “to have someone poking around in their data centers;” and 3) fallout from hierarchical storage management (HSM) and information lifecycle management (ILM).

I believe the first two points will resolve themselves in a short time as organizations become more educated about tiered storage and its benefits, and as products become more widely available to help organizations identify persistent data and move it among available storage resources transparently.

As for the association with HSM and ILM, that, too, is surmountable. HSM and ILM failed to catch on in any significant fashion for two reasons: 1) the market wasn’t ready for them, and 2) there were underlying problems with enabling technologies for both. HSM products were difficult to use, and the ILM products didn’t work well with each other or with the storage devices themselves.

So, yes, the concept of tiered storage isn’t new. Like HSM and ILM, tiered storage is all about improving data management. It’s about ensuring data is accessible from the right storage devices at the right time and at the right cost-and with the least drain on people, storage, and power resources.

But today’s market conditions are very different. Users are more educated about tiered storage, and the technologies exist to enable organizations to identify and classify different data types, as well as move data from tier to tier and search within those tiers. And, in general, users are less gun-shy about trying out “new technologies” due in part to the success of other emerging technologies- notably, disk backup and data de-duplication.

In my previous column, I wrote about the potential capital and operational savings that can result from moving persistent data off primary storage and onto secondary storage tiers, including significantly lower physical disk and management costs. However, I didn’t highlight the potential benefits from a pure backup perspective.

When you move persistent data off primary disk, you’re not only freeing up primary disk capacity, but you’re also freeing up secondary backup capacity. In fact, depending on how much persistent data you move off your primary disk and the frequency and type of backup you do, the impact could be huge because you’re not backing up the same data over and over again.

The hard cost savings of eliminating these backups include the capital costs of the backup media (disk or tape), WAN bandwidth costs if data is replicated, management costs, and facility costs for housing the backup data. “Soft” cost savings, while more difficult to calculate, can also add up. Soft cost-savings include less- tangible things like operational and recovery benefits realized as a result of the change.

Even greater cost savings can be realized if persistent data is moved off primary disk and onto de-duplicated Tier-2 disk and/or the remaining primary data is de- duplicated during the backup process or after the data has been written to disk.

I live and breathe open-systems storage. This isn’t necessarily a bad thing, but from time to time, I find myself reminded of the need-and benefit-of thinking in broader IT terms. Although I wrote the previous column article from my vantage point of the open-systems market, the message could have just as easily been directed to mainframe shops. Mainframe administrators share the same dirty secret: A significant percentage of the data they have sitting on costly mainframe storage is persistent.

The benefits of getting persistent data onto more appropriate storage tiers are just as great to mainframe shops as they are to open-systems shops. However, mainframe users also tend to be more resistant to change than their open-systems counterparts, despite the fact that many of the technologies we find today in the open-systems world have mainframe roots.

Because of the mainframe culture, “fixes” to the persistent data problem need to be done as unobtrusively as possible, with minimal changes to the mainframe environment itself. Such technologies do exist, but they are fewer in number and tend to be less-advanced (or feature-rich) than their counterparts in the open- systems world.

Nonetheless, the comparison is interesting to draw. While the number of mainframe installations may pale against the number and scale of open-systems deployments, mainframes are a foundational element of many data centers and cannot be ignored.

Also, there is the potential for some technology “cross-pollination” here:

  • Data de-duplication technology from the open- systems’ world could be applied to the persistent data that is migrated off mainframes; and
  • Persistent data from both worlds could be pooled in one “uber” repository for maximum efficiency. Food for thought.



IT's dirty little secret

White paper

Disk-to-Disk (D2D) Backup Enables Fast, Cost-Effective Backup, Restore, and Nearline Storage


Disk-based Backup for Data Protection

Heidi Biggar is an analyst with the Enterprise Strategy Group (www.enterprisestrategygroup.com).

This article was originally published on January 01, 2008