Data archiving in the cloud involves storing large amounts of data for many years, and many cloud storage companies offer storage services specifically for data archives. It's a growing sector that is expected to explode from $3bn in 2014 to $7bn in 2017, according to research from Oracle. Here are five things to look at when you compare cloud storage for archive services.
1. How many copies of the data are there?
Best practices suggest that cloud archive services make at least three copies of your archived data. But when you carry out a cloud storage comparison you may find that the exact details of how your archived data will be stored is not easy to come by – particularly from the larger cloud storage providers.
For example, Amazon's Glacier cloud archive service boasts average annual durability of 99.999999999% for an archive, and says "the service redundantly stores data in multiple facilities and on multiple devices within each facility. To increase durability, Amazon Glacier synchronously stores your data across multiple facilities before confirming a successful upload."
By contrast, smaller cloud archiving companies may be more explicit. For example, UK-based archiving service provider Arkivum explicitly states that its cloud-based Arkivum/100 service stores data at two separate data centers, with a third copy stored offline on tape in a third party escrow facility.
2. How easy is it to get your data back?
This is relevant when carrying out a cloud storage comparison in case you need to retrieve your data if the cloud storage company were to stop offering an archiving service or go out of business.
"We are not seeing so many "data hostage" situations now, but some vendors make it difficult and costly to get your data back so you definitely want to know what format it will be in," says Garth Landers, a research director at Gartner.
But precise storage information is not always available in cloud storage comparisons: for example Amazon does not reveal how data in its Glacier service is physically stored, while Google says only that data stored on its Nearline service is stored on the same infrastructure as its other cloud storage services.
For maximum "retrievability" should a cloud storage service shut down with a limited time-window for retrieving data, some service providers ensure that offline copies are stored on commonly used media such as LTO tape, using an open standard file system such as LTFS, for which open source data access tools are readily available.
That means that once you have the tape in your possession you will be able to access your data without requiring any proprietary hardware or software from your provider.
"If customers want to move their data, or our company fails, customers can go to the escrow provider and get a copy of their data back on LTO tape and they will be able to read it easily," says Matthew Addis, CTO at Arkivum.
3. Where is encryption carried out?
A cloud storage comparison will reveal that almost without exception every cloud archive service provider stores data in encrypted form.
But an important question to ask is: where does the encryption takes place? Some cloud archiving providers like Amazon encrypt your data as it arrives in the cloud for archiving, managing the encryption keys for you. In that case it will usually also be temporarily encrypted "in flight" as it is transmitted from you to the provider.
You can also encrypt your data before uploading it, allowing you to keep control of at least one set of encryption keys. A common way to achieve this is to use a cloud gateway appliance deployed at your premises that handles all the encryption before your data is sent to the cloud.
By managing your own encryption keys you avoid having to trust your service provider to manage your encryption keys securely and have complete control over the encryption. But key management can be complex, and a service provider is more likely to have the staff, skills and processes in perform it securely than most organizations.
4. What happens if your archived data is lost?
Losing archived data can have a huge impact on the viability of your business, and if you are required to archive certain types of data for specific lengths of time for regulatory compliance reasons then data loss could lead to fines and other sanctions for falling out of compliance.
But if a cloud archive service provider loses your data, it is still your problem, warns Gartner's Landers. "It's your data and your decision on how to store it, so using a third party does not get you off the hook," he says.
He adds that the most you can often realistically hope for if a cloud provider loses your data is compensation in the form of service credits. "We don't see providers taking responsibility if you fall out of compliance – the scope is just too broad."
However a cloud storage comparison will reveal that that some archive providers like Arkivum do include professional indemnity insurance as part of their service. "If we couldn't get your data back then that would cover certain costs," says Arkivum's Matthew Addis. "It is also an indication to customers that the insurance company has audited our operations, so it is evidence that we are insurable."
An Arkivum customer that values this is Loughborough University in England. "If we are archiving data that was generated as a result of a (financial) grant and the data gets lost, then we could have to pay the money back," says Gary Brewerton, the university's middleware and library systems manager. "With this insurance we won't have to pay, and that gives us confidence to use the service."
Of course being insurable doesn't mean that there is no risk of data loss – but the fact that the insurance can be provided at a sensible rate indicates that the risk is perceived to be low. That allows a smaller company like Arkivum to have credibility in the face of far larger and more established companies like Google or Amazon, Brewerton believes.
5. Cloud vs in-house archiving
As part of any cloud storage comparison to evaluate different archiving services, it also makes sense to establish whether using a cloud service rather than archiving your data in-house is the best way to go.
A key benefit of handling archiving in-house is that you keep control of your data and don't have to trust a third party to keep it secure and available.
But if you want to keep copies of archive data at multiple locations and have the integrity of the data and the storage media checked regularly – as best practices dictate – then this can be hard to achieve in a low-cost manner, especially if yours is a small organization.
For larger organizations the major difficulty with in-house archiving efforts is ensuring that the archiving solution can scale to accommodate large amounts of archive data in the future. This in turn means that the ability to predict future storage requirements accurately is critical – but this not always possible.
"We have found it very hard to predict the volume of data we need to archive," says Brewerton. "We have spreadsheets, raw data, high definition video, and it is very unpredictable so we needed something quickly scalable. We didn't want to archive locally as we would have been bound to get our sums wrong."
Even if you can accurately predict your storage needs, archiving infrastructure planning and deployment is also time consuming and difficult to get right, warns Landers.
"That completely goes away with a cloud archiving solution," he concludes.
Photo courtesy of Shutterstock.