Musings on the future of data dedupe

December 10, 2010 – I recently chatted with a few vendors in the data deduplication space. As conversations often do at this time of year, the talk turned toward the future of data deduplication. Here are a few snippets.

“Tier 1 storage vendors will move past point solutions for deduplication next year,” says Tom Cook, CEO at Permabit. “They’re working toward end-to-end deduplication, across SAN, NAS, unified [block and file], nearline and backup.”

“When that happens, once their customers ingest data and get it into a deduplicated state they’ll never have to re-hydrate that data throughout its lifecycle. The data will stay deduplicated through processes such as replication and backup. That’s a huge savings in workflow, footprint and bandwidth,” say Cook.

“Today, the big vendors use a variety of point solutions, but they’d like to use a single data optimization product across all their platforms, whether it’s block or file, primary or secondary. End-to-end deduplication will creep into the market in 2011 and 2012,” Cook predicts. (Permabit sells deduplication software – dubbed Albireo – to OEMs.)

Personally, I don’t think that single-solution, end-to-end deduplication will happen that quickly, in part because of the huge investments that the Tier 1 vendors have made in their “point solutions,” but we’ll see.

Dennis Rolland, director of advanced technology at Sepaton, has some predictions that are similar to Cook’s, as well as some differing opinions regarding trends in the data deduplication market.

“Dedupe will be required in more places going forward, including primary storage in addition to nearline storage, and end users will have to cut down on how many dedupe solutions they have because of the complexity in managing many disparate solutions,” says Rolland, “but we’ll probably still have distinct solutions for primary and nearline storage deduplication.”

Rolland thinks that the emphasis on deduplication benefits such as capacity, footprint and cost savings is shifting. “Dedupe enables low-bandwidth replication, which in turn enables companies to economically deploy DR [disaster recovery] sites,” he says.

Rolland also links two technologies that will no doubt make my list of The Hottest Storage Technologies for 2011 (assuming I get around to making such a list): data deduplication and cloud storage.

“Dedupe is an enabler for cloud storage,” says Rolland. “It makes it practical to deploy cloud storage because you’re sending, say, 10x less data over the WAN. That has significant implications for deploying cloud-based DR.”

(Sepaton bundles data deduplication software with its virtual tape libraries, or VTLs.)

Meanwhile, Quantum released the results of an end-user survey this week that suggests U.S. companies could save $6 billion annually in file restore costs by adopting deduplication.

According to the survey of 300 IT professionals, respondents spend an average of 131 hours annually on file restore activities, with 65% restoring files at least once a week. Based on the average wage for IT professionals in the US ($31.55 per hour according to PayScale.com), that equates to $9.5 billion. However, Quantum’s survey also found that those companies that are most efficient at file restoration predominantly use deduplication and can complete restores in approximately one-third the average time of all respondents. So, according to Quantum’s press release: “If the broader US market was to achieve similar data restore efficiencies, the potential annual savings for US businesses would be approximately $6 billion.”

This survey seems a bit misleading to me because it’s not really focused on the advantages of data deduplication per se in a file restore context but, rather, the advantages of disk-based backup/recovery.

Steve Whitner, Quantum’s product marketing manager for DXi, explains: “If you back up to regular [non-deduplicated] disk and you have a need for DR, you have to get that data to another site and you can’t keep data on conventional disk for very long – maybe a few days or a week. So the real issue is not the speed of restore; it’s the fact that companies can now store a month or two of deduplicated backup data on disk.”

You be the judge. Here’s Quantum’s press release and here are some supporting slides from the survey results.

One thing is clear: In 2011, the focus will shift from deduplication for nearline/secondary storage to deduplication for primary storage. Witness two of this year’s biggest storage acquisitions: Dell buying Ocarina Networks and IBM acquiring Storwize. (Storwize’s technology is now in the IBM Real-time Compression business unit.)

Related blog posts:

What is progressive deduplication?

Data deduplication: Permabit finds success with OEM model


posted by: Dave Simpson

Dave Simpson, Editor-in-Chief
by Dave Simpson

Dave Simpson has been the Editor-in-Chief of InfoStor since its inception in 1997. He previously held editorial positions at publications such as Datamation, Systems Integration, and Digital News and Review. He can be contacted at dsimpson@quinstreet.com

Previous Posts