De-dupe inches into VTL backup

Part one of a two-part series

By Kevin Komiega

The exodus from tape-to disk-based backup is well underway as IT managers attempt to find a way—any way—to escape the money pit that is tape management. But what might be surprising to some is that while disk-based backup solutions have become an incredibly popular alternative to tape-based infrastructures, data de-duplication is still considered by many end users to be unproven despite the obvious capacity-saving benefits of the technology.

According to the latest end-user survey from TheInfoPro (TIP) research firm, approximately 74% of the Fortune 1000 firms surveyed have a disk-to-disk (D2D) backup infrastructure in place (see figure). But, of those D2D users, only 12% are using data de-duplication. Additionally, many of those using de-duplication are running it in test and development environments rather than on production storage systems.

“Most of the people doing data de-duplication are just getting started,” says Robert Stevenson, TIP’s managing director of storage research. “Most firms with disk-to-disk backup technologies in use today are slowly retrofitting their systems to incorporate de-duplication.”

Stevenson also says managing backups is the albatross around the neck of many storage administrators. In fact, he says, the average Fortune 1000 company has up to 12 people dedicating 40% of their time to tape management.

“Disk prices keep dropping, but backup costs are the same. Customers are interested in anything that can in some way minimize the resources dedicated to tape, and data de-duplication consistently tops that list,” says Stevenson.

On the bleeding edge

Bob Rader, storage and backup manager for the University of New Hampshire (UNH), plans to flip the switch and put the data de-duplication feature on his Sepaton S2100-ES2 virtual tape library (VTL) later this fall. He believes he is ahead of the IT curve.

De-duplication is low on Rader’s list of buying criteria for a VTL. Time-to-restore, scalability, and speed are his top concerns.

UNH’s performance requirements were set at a sustained backup throughput rate of at least 350MBps with the ability to scale performance to 600MBps. The university also set minimum performance requirements for protecting large Oracle databases to store administrative data. The minimum required throughput for any individual restore request is about 60MBps, with maximum throughput of 100MBps when needed.

Rader and his IT team purchased Sepaton’s S2100-ES2 VTL with one SRE node containing four disk shelves with 30TB of usable space, along with a 20TB DeltaStor de-duplication software configuration. The S2100-ES2 meets UNH’s performance and capacity criteria, but de-duplication seemed a dicey proposition.

“We had heard rumors that de-duplication was great during write operations, but it was hit or miss with restores,” says Rader. “Also, finding customer references was difficult. We knew plenty of people using virtual tape libraries, but a year ago it seemed that no one was running de-duplication in production environments. I feel we’re on the leading edge with this technology.”

UNH plans a phased approach to deployment, and Rader’s expectations for data de-
duplication are relatively modest. “Our de-dupe ratio requirements are conservative because our primary goal is simply to beat the cost of tape,” he says. “It’s ridiculous when you hear ratios of 20:1 or 40:1. If we hit 6:1 it will be better than sticking with an equivalent capacity of tape. Anything beyond that is gravy.”

Not enough time in the day

Virtual Iron Software serves up server virtualization and consolidation products and has a team of technicians providing support and services for small and medium-sized businesses (SMBs), but when it comes to the company’s internal systems it’s a one-man operation.

“Our backup window got to be unmanageable,” says Virtual Iron’s IT manager and self-proclaimed one-man IT department, Eric Bechtol. And when he says “unmanageable,” he means it. “It was somewhere around 25 hours. If you do the math, something was getting skipped somewhere.”

Bechtol came to a crossroads late last year when he was faced with the decision of buying more tape libraries versus implementing a disk-to-disk backup system. He opted for the latter by implementing a 2TB DXi3500 disk-based backup appliance from Quantum. The DXi3500 uses Quantum’s data de-duplication technology to increase the amount of backup data users can retain on disk.

Quantum claims the data de-duplication feature of the DXi3500 can cut capacity by 10 to 50 times, but de-duplication was not the selling point for Bechtol. “Tape management is the most painful job in IT next to fixing printer jams. The [DXi3500] cut my backup window by about half and restores come back quick, freeing up several hours a week that I used to spend monkeying around with tapes,” says Bechtol. “So when I saw that the DXi had de-duplication, I said ‘why not?’ ”

Bechtol was told he would see real-world de-duplication ratios of about 10:1. In reality, he only expected about 6:1 for his file server, Microsoft Exchange, and SQL Server data. He was pleasantly surprised.

Virtual Iron is now seeing data reduction ratios of about 13:1 and is storing about 19.5TB of data on a 2TB system. “It saved a lot more space than I thought it would. I would have been happy with 5:1 reduction. It would have hurt if I had to buy another 20TB of storage,” says Bechtol. “Free disk is a beautiful thing.”

From skeptic to true believer

Replacing and maintaining an aging tape infrastructure has historically been a major thorn in the side of Chris Watkis, IT director for Grey Healthcare Group (GHC), a marketing and advertising firm for the medical and pharmaceutical industries.

“One of our main problems was end of life. The tape equipment we had was very old, and we were spending quite a bit of money maintaining support contracts,” explains Watkis.

“Also, the staff was spending an inordinate amount of time managing backups and the time to recovery was so long it was scary,” he adds.

GHC develops marketing and advertising campaigns comprising video, photography, and online services. Many of the firm’s digital media files exceed 2GB in size, so as the business grows, so does its data.

“Our tape library was at its maximum capacity, and even with 8TB on our SAN we wanted more storage to protect our work,” says Watkis.

If a system failure were to occur, it would take several days to restore data from GHC’s LTO-based tape library. The organization’s backup procedures required an administrator dedicated to managing them.

GHC worked with a company called VirtuIT to implement a FalconStor VTL Storage Appliance with embedded data de-duplication functionality. Now backups are done directly to disk using a backup server connected to the appliance.

Overall, the VTL solved GHC’s backup problems. “The VTL has already worked out for us financially. It costs half of the amount we spent on tape media and the money we spent on maintaining tape,” says Watkis. “We’ve also been able to decrease the time spent managing backups.”

However, the newness of data de-duplication gave Watkis pause. “There weren’t many [data de-duplication] users a year ago. I was one of those people with a short history of using the technology and I had one of the most extensive deployments,” he says. “I was skeptical and very realistic with my expectations and didn’t rely too much on de-duplication because the technology was very new,” says Watkis.

However, once de-duplication was introduced into the backup stream the results were astounding. FalconStor’s de-duplication reduced GHC’s data sets for backup from 175TB to 2TB, for a ratio of more than 72:1. “This was quite eye-opening,” Watkis says. “It made us aware of just how many duplicate files were being backed up and re-backed up by the previous system.”

Watkis may be the exception to the rule, as the de-duplication ratios reached by GHC are not the norm. Companies tend to achieve very high de-duplication ratios in full backups, averaging between 15:1 and 20:1 reduction, according to TheInfoPro’s research. The ratios dip when de-duplication is applied to file content, with typical ratios between 5:1 and 9:1.

TheInfoPro’s Stevenson believes end users are happy with some level of compression, whatever that may be. “Any savings is a savings. If users can compress at a ratio of 2:1 it means that they will potentially have fewer tapes to deal with,” says Stevenson.

This article was originally published on September 01, 2008