In Search of a Better Data Archive: From Tape to the Cloud

By Drew Robb

Archiving has been done for centuries. Ancient manuscripts have proven invaluable in preserving a link between the present and the distant past. In terms of the enterprise, they provide access to long-inactive data that suddenly becomes important – or at the very least, they offer a means to comply with a plethora of data retention regulations.

“Less than 5% of data is ever analyzed or touched again,” said Fred Moore, an analyst with Horison Information Strategies. “Most data reaches archival status in less than 30 days.”

But how exactly should you set up a digital archive?  Various approaches have been employed over the years. Some would do endless backups and keep a mountain of backup tapes in a room. This approach tended to keep dozens if not hundreds of copies of the same file. And then finding the right tape, never mind the file within it, was much too cumbersome.

Then came innovations like tape libraries and autoloaders, which added some automation into the tape equation. But even then, access times often proved too slow. If there was no hurry for the file, tape did fine. But if it was wanted immediately the wait of minutes if not hours was unacceptable.

Another data archiving option was optical disk. Many have tried to erect an optical library over the last decade or so, but it hasn’t really caught on. Facebook is the most recent proponent. While response times are a little faster than tape, the relatively low capacity per disk inhibits this platform’s value as an archiving platform.

Next candidate: disk. Hard disk drive (HDD) vendors chance of establishing disk as the go-to platform for archiving hit the wall when data volumes escalated. Once you get into the PB range, disk is so expensive that it becomes a tough sell at that scale. 

So what does that leave? Moore advocates the Active Archive, which is a combo of disk and tape. 

“An Active Archive combines the simplicity and performance of disk with the economics of tape,” said Moore. “It can scale to billions of files for frequently accessed archival purposes.”

He sees it working like this within a storage tiering setup: Flash is the top tier for high performance, very low volumes of data. The next tier is enterprise disk, again with a fairly low capacity. The remaining approximately 80% of data is housed in two tiers: Slower but high capacity SATA disk takes up that data that has the highest chance of being requested, providing it with relatively fast archival response; below that is 43% to 60% of all data housed on tape.

“As data ages, it must go into lower tiers as it is not cost effective to keep it on expensive media,” said Moore. “HDD and flash are much better for IOPs, while tape is better for data rate.”

His vision for the archive of the future is one that combines HDD, tape, the Linear Tape File Systems (LTFS), data management and the cloud. It would make use of onsite and cloud storage to cope with the demands of archiving, and utilize data management software to automate tiering and manage information retrieval.

Tom Coughlin, an analyst at Coughlin Associates, agrees. He sees the cloud as an obvious place for an archive.

“Cloud storage will be used mostly for archiving going forward,” he said. 

Per his surveys, tape is used for 40% of all archiving currently but has the highest growth rate of 59% in 2015.

“Magnetic tape is cost effective when combined with HDD or flash to make an Active Archive,” said Coughlin.

Old, Not Obsolete

“I am old, but not obsolete,” said Arnold Schwarzenegger’s Terminator character in the most recent edition of the movie franchise. Jon Toigo, an analyst with Toigo Partners, thinks this very much applies to tape, particularly as an archiving platform.  

“Archived data needs to be preserved on portable media that can be retained for an extended time,” said Toigo. “Entrusting it all to a disk-based cloud may not be wise.”

He argues that disk reliability, error rates and the problem of choked data rates due to archives having to travel from the cloud over the network point to tape as a better repository for long-term data. 

He also proposed a novel architecture as a possible archiving platform of the future. Above it, would sit an enterprise-class storage infrastructure using parallel IO-enhanced commodity disk and flash. Toigo said this would cost about $1.5 million per PB and would have a throughput in excess of 450,000 IOPS. It would only host very active data. Below that would sit a tape subsystem using an LTO-7, 12U, 160-slot library with a front end Network Attached Storage (NAS) gateway appliance with the entire arrangement costing about $40,000 to host archive data.

Photo courtesy of Shutterstock.

This article was originally published on January 14, 2016