More Thoughts on Data Integrity

By Henry Newman

Let's assume you've accepted the fact that you are not going to get 100 percent of your data back 100 percent of the time. What should your expectations be?

My view is that someone should calculate the expected data loss and provide some kind of guarantee. The question I have been thinking about is how that can apply to someone with less than the data loss volume. Let's say a vendor provides a 10/9 guarantee against data loss. Another way to look at it is 112,590 of bytes lost per petabyte (1024*1024*1024*1024*1024 bytes). If I have only 40 TB of data in the archive, should I expect 40/1024*112590 bytes of data loss (4,398)?

This is totally unrealistic. What is the 112,590 bytes? Was a single byte in 112,590 PDF or JPG headers? That could mean 112,590*8MB (average size of the pdf or jpg) or, even worse, headers from mpgs. We are now talking about over 879 GB. Clearly, in my 40 TB archive that would be totally unacceptable. But what is acceptable, and how could this become auditable? That is the big question. First of all, the audit must be independent, and it must be based on a documented set of standards that need to be agreed to similarly to ISO standard. The problem with ISO audits is that many are based on internal people in the company doing the audit. That will not work in the case of data loss, and too much will be at stake.

I do not see this situation easily resolved, but there are many examples in other markets and industries, such as external account audits when something bad happens, that could be used as example. I am not sure that our industry is ready for the complex regulation that exists in accounting, but by the same token I am not sure that archives, archive vendors and the people that use them do not need the protection.

This article was originally published on October 24, 2011