What is progressive deduplication?

October 19, 2010 -- Arkeia Software is putting the final touches on the 9.0 release of its Arkeia Network Backup software, which the company plans to deliver in the first quarter of next year, possibly by mid-January. We'll cover that release in more detail as the company gets closer to shipping, but for now . . .

The key addition to the 9.0 release is a technology that Arkeia now refers to as "progressive deduplication," which was previously called "sliding window with progressive matching deduplication."

You'd have to be a math wiz to understand this technology from an algorithmic perspective, but here are the basics, culled from a recent conversation with Arkeia CEO Bill Evans:

Progressive deduplication is an alternative to the older fixed-block deduplication and the newer, more common variable-block deduplication.

Arkeia's data deduplication implementation is global (vs. local), byte-level (vs. file-level), source-side (vs. target-side, although it supports both approaches as well as a mix), in-line (vs. post-processing), and content-aware. But the real differentiator is in how the software handles block sizes.

As with variable-block deduplication, the block size can be adjusted for optimal deduplication ratios, but Arkeia claims a "better" implementation that is more content-aware (or application-aware) than existing approaches. Arkeia Network Backup 9.0 software automatically adjusts block sizes based on file type in order to maximize dedupe ratios.

Arkeia acquired the data dedupe technology when it bought Kadena Systems about a year ago.

Arkeia claims two key advantages of progressive dedupe vs. traditional variable-block dedupe: It's faster (which reduces the size of backup windows) and it delivers higher deduplication ratios (which reduces storage capacity and network bandwidth requirements).

The company isn't ready to make specific performance or dedupe ratio claims, but CEO Evans reports that in internal tests using VMDK files the company achieved a 38% improvement in dedupe ratios compared to "one of the leading deduplication vendors" (which I assume to be either Data Domain or Quantum).


"We think we'll have better dedupe ratios than any other vendor," says Evans.

The proof will be in the pudding. Until we get some independent benchmark results, we'll have to take these claims with a grain of salt, but progressive deduplication appears to be an interesting technology that could take deduplication to a new level.

The Arkeia Deduplication Option will be priced at $2,000 per media, server, but will be free for companies that license Arkeia's software or appliances (physical or virtual) by December 31.

To participate in the beta program for Arkeia Network Backup 9.0 and the progressive deduplication technology, click here.


Next month, Arkeia will release a deduplication profiling tool that will enable users to measure actual deduplication ratios at various block sizes to determine the optimal block size for each file type.


Related articles:
Arkeia integrates backup with VMware vStorage
Arkeia acquires Kadena for data dedupe

Labels:

posted by: Dave Simpson

Dave Simpson, Editor-in-Chief
by Dave Simpson
Editor-in-Chief

Dave Simpson has been the Editor-in-Chief of InfoStor since its inception in 1997. He previously held editorial positions at publications such as Datamation, Systems Integration, and Digital News and Review. He can be contacted at dsimpson@quinstreet.com

Previous Posts

Archives