Users combine VTLs and data de-dupe

By Kevin Komiega

Virtual tape libraries (VTLs) have been thoroughly vetted by the end-user community and have proved to be a reliable, cost-effective option for speeding the backup-and-recovery process. In fact, according to a recent InfoStor QuickVote poll of its readers, 26% have already deployed VTLs to some degree, and another 38% plan to do so later this year (see figure). Now there’s a new generation of data de-duplication technologies that are making VTLs even more attractive to users bogged down by the arduous management tasks associated with traditional tape backup.

Data de-duplication is emerging as the next big thing in the data-protection market. According to recent research from the Taneja Group, it is inevitable that virtually all VTL vendors will incorporate some form of data de-duplication technology into their solutions over the next few years. Taneja Group analyst Steve Norall predicts that, by 2010, the VTL market will grow to more than $1 billion, and that VTLs with data de-duplication will become the top competitor to tape automation systems in the backup market.

Click here to enlarge image

The benefits of VTLs with data de-duplication are obvious. Users can store more data in the same space for less than the cost of physical tape, while benefiting from the online access and speedy restore times of disk.

Carl Follstad, manager of the University Data Management Services (UDMS) group at the University of Minnesota’s Office of Information Technology, is intrigued by the prospect of all VTLs eventually having data de-duplication capabilities.

“It appeals to me. I’m in the business of trying to protect as much data as I can. The economies of scale of the de-dupe paradigm get better as I go along and get more data from more departments,” says Follstad. “The more data I protect, the better numbers I get.”

Follstad and the university have already begun integrating de-duplication into their environment to augment a legacy tape infrastructure. “We used a hybrid approach of backup to disk and tape silos. The problem was, since we are essentially a service provider to the university, the commitment to our customers was growing faster than we could manage,” he says.

The UDMS group provides managed storage services for electronic data to the entire university community. Follstad is responsible for designing and managing the school’s SAN, NAS, and data-protection infrastructure in support of a range of applications, including e-mail, for about 60,000 students and 23,000 faculty and staff members.

Follstad and his team ran his legacy backup environment and a newly installed trio of Data Domain VTL appliances with data de-duplication software in parallel for 30 days to stress-test the new systems. Prior to evaluating VTL products with de-dupe technology, Follstad performed an in-depth return-on-investment (ROI) study and calculated that achieving a de-dupe ratio of 10:1 would put the cost of the VTL implementation on par with that of a new tape silo.

After 30 days, Follstad switched over entirely to the Data Domain appliances for his primary backups and within 90 days was seeing de-duplication ratios of up to 20:1. Six months later, the university is enjoying even better results.

Data Domain’s VTL software emulates multiple tape libraries over a Fibre Channel interface and integrates with the existing backup infrastructure. The VTL software lets Data Domain arrays emulate multiple tape libraries with up to 47 virtual LTO tape drives and 10,000 virtual slots across up to 100,000 virtual cartridges. The addition of Data Domain’s Global Compression software brings de-duplication with local compression that breaks data down by pooling redundant patterns within a file, across files, and within blocks, and stores only unique data segments.

There are scenarios, however, in which applying de-duplication technology is not necessarily a panacea.

“Some applications de-dupe better than others,” Follstad explains. “Depending on which vendor you talk to, they’ll tell you databases will de-dupe better than file server data-or vice versa. I think de-dupe is probably a slam dunk for conventional Windows file server data that’s not encrypted. Databases, who knows?”

On the other hand . . .

There is a contingent of naysayers in the industry, mostly competing vendors, who see VTLs as nothing more than a temporary fix. Some believe that although the combination of VTLs with data de-duplication relieves some of the symptoms of the backup problem, they do little in terms of curing the illness.

“There might be value in a VTL solution with de-dupe for a company that has made a substantial investment in tape already and has an opportunity to leverage that investment with a VTL. But I see it as an interim step. It’s a stopgap on the road toward true disk-to-disk backup,” says Richard Heitmann, vice president of product marketing at EVault (which was recently acquired by Seagate).

Heitmann says some approaches to de-duplication currently being employed by VTL vendors do nothing to decrease backup windows because all data needs to be backed up, and duplicates are only eliminated after the backup occurs. He also challenges the idea that a VTL with data de-duplication will save end users money.

“Deploying expensive appliances to save a few extra gigabytes of storage will increase net costs, especially if the processing overhead and reduction in throughput causes your backups to exceed the available window and impact your normal business operations,” says Heitmann.

EVault offers online disk-to-disk backup and restore via its EVault Protect service. The EVault Protect service stores customer data on disk in a remote location for a fee.

The pros and cons of a disk-based managed service versus a VTL with data de-duplication are debatable. But it’s a fact that organizations have a lot of money invested in tape, and the non-disruptive, work-with-anything nature of VTLs pre-sents a compelling argument.

CitiStreet (a State Street and Citigroup company) is a global benefits provider and one of the nation’s largest retirement plan record-keepers with more than $200 billion under administration. CitiStreet manages its data from two data centers with more than 100TB of stored data, which includes participant information along with administrative data such as e-mail, file servers, home directories, and more.

Jeff Machols, CitiStreet’s vice president of global infrastructure and systems integration manager, debated a wholesale replacement of the company’s existing tape libraries, but concluded that a rip-and-replace of the entire tape infrastructure would not satisfy long-term requirements. Machols brought in VTLs and, eventually, de-duplication appliances, to slowly phase tape out of his environment.

“We had some DLT7000 and LTO tape libraries. Everything was working, but it wasn’t blazing fast. You had to do a lot of work to keep tapes refreshed. The hardware started to age a bit, and we needed to do a technology update. It was around that time that VTLs became production-ready,” says Machols.

CitiStreet’s internal policies and federal regulations require that data be replicated on an ongoing basis to geographically dispersed locations, while also not allowing any non-encrypted tapes to go off-site. “Our storage requirements continue to grow and our backup windows continue to shrink,” says Machols.

CitiStreet evaluated a number of VTL vendors, and eventually went with Sepaton’s S2100-ES2 VTL system to meet its needs. One 35TB S2100-ES2 unit has been deployed in an HP-UX environment in CitiStreet’s Jacksonville, FL, data center, replacing two older physical tape libraries with four DLT 7000 tape drives each. And, a 4 TB VTL was recently deployed in CitiStreet’s Quincy, MA, headquarters. Each data center serves as the disaster-recovery center for the other.

But even with the Sepaton appliances in place, CitiStreet’s data growth is still projected to continue unabated. That’s why Machols began evaluating de-duplication to get a jump on controlling future capacity requirements.

Machols says that the addition of de-duplication appliances has given him a cushion when it comes to buying more capacity. “Based on a very conservative compression ratio we have about 36 months until we’ll need more capacity.”

VTL vendors with data de-duplication technology either on the market or currently in development include Data Domain, Diligent, FalconStor Software, Quantum, and Sepaton. Industry analysts believe Network Appliance (which acquired Alacritus), Overland Storage, and Sun Microsystems are also likely to offer VTL solutions with de-duplication in the future.

Sepaton ships VTL de-dupe appliances

By Kevin Komiega
Disk-based backup vendor Sepaton has upgraded the speed and capacities of its line of virtual tape library (VTL) and data de-duplication appliances and added a new offering for small and medium-sized businesses (SMBs) looking to kick the tires on data de-dupe technology.

Sepaton’s S2100-DS2 DeltaStor appliance is available with 7TB of capacity and can potentially protect up to 200TB of data, in a 3U form factor. In addition to SMBs, the S2100-DS2 is targeted at departments in larger organizations.

Click here to enlarge image

“We’re seeing a demand [for VTLs] in enterprises that want to install systems at remote locations, and we’re also seeing the SMB space quickly picking up,” according to Linda Mentzer, Sepaton’s vice president of marketing.

Like its bigger brother, the S2100-ES2, the S2100-DS2 uses Sepaton’s Content-Aware architecture to deliver data de-duplication without impacting the backup window. The DeltaStor appliances are used in tandem with Sepaton’s S2100-ES2 Series 500 and S2100-DS2 VTLs to maintain backup-and-restore performance from 1TB per hour up to 17.2TB per hour by de-duplicating data outside of the primary data path.

The result, according to the company, is a de-duplication ratio ranging from 25:1 up to 50:1 for a typical mix of business application data such as e-mail, database, and files. Meanwhile, Sepaton’s other appliances got a boost from new hardware components.

The high-end S2100-ES2 Series 500 VTL now features 4Gbps Fibre Channel connections and scales from 7TB to more than 1PB. Integrated hardware compression can double those capacities.

With the DeltaStor de-duplication appliance option, the appliance could scale up to 50PB.

The S2100-DS2 VTL, for SMBs and departmental environments, is available in 3.5TB and 7TB configurations and incorporates software compression to potentially double capacity up to 14TB.

The software remains the same, says Mentzer, but the underlying hardware platform has been completely revamped. “We are now using Intel servers in the VTL nodes instead of Dell servers, as well as 500GB SATA drives,” says Mentzer. The appliances are compliant with the Restriction of Hazardous Substances (RoHS) Directive.

The S2100-DS2 DeltaStor appliance is priced at $75,000 for up to 100TB of capacity. Pricing for the S2100-ES2 Series 500 VTL starts at $59,000, while the S2100-DS2 VTL starts at less than $18,000. DeltaStor upgrades are also available for existing S2100 customers.

This article was originally published on March 01, 2007