Archival Data vs. Archive

By Henry Newman

There needs be more emphasis on a tier of storage not accessed much that must be constantly available. Archival data is data that you need to keep but you might not access either for long periods of time or ever. This type of data exists in many fields from Sarbanes-Oxley compliance data, to weather forecasting, to Landsat satellite images, to medical records, to my childhood pictures. None of this data is likely needed every day or even every week, but we still need to keep it.

It would be nice if there were a way to keep this data safe and secure (whether I am a home user or a company large site or government site) that scales, is easy to manage, and has a reasonable price. Archives today require people with HSM skills and a significant investment in hardware and software. But there is NO guarantee your data is safe, as no one I am aware of calculates the reliability of the data in the archive, which I know from firsthand experience is not easy to determine. I believe this is a problem ripe for solving. There are, of course, a number of problems that need to be addressed, such as the reliability that is the liability. Lawyers will have a field day in case of data loss, and there will be some data loss. No storage medium is 100% reliable, even with multiple copies.

The more I think about this problem, the more I think that we need to reset expectations for those that archive data and expect and demand it back from the archive.

This article was originally published on October 17, 2011