So what does it mean when a data storage professional says they want cold storage?
It usually means that the data is not going to be accessed very often – or maybe never – but they want to keep the data.
Yet the amount of access that makes something cold has different meaning for different people. And of course the term “cold storage” is thrown around casually, even thought it has a defined meaning. I think there needs to be a clear definition so that the industry can start architecting products for cold storage, and system architects can develop well defined architectures that can meet user expected service level agreements for data access for cold storage. Vendors need to be able to define cold storage clearly.
All of this of course requires agreement from everyone, but since no one has put forth a proposal, here is mine.
Cold Levels
The term cold data should be broken into 4 different levels with different meanings. Here’s my attempt (including a bit of humor). Additionally, I think that we should stop taking about data and start talking about collections:
· Polar Data Storage Collection
· Icy Data Storage Collection
· Cold Data Storage Collection
· Chilly Data Storage Collection
Polar Collection
This is data that will likely never be used – but you never know. I would like to suggest that the polar collection will only need .5% or less of the collection back over a one year period. So for a 10 PB collection, the most used over a year would be 50 TB.
Icy Collection
This is data that will be used but not very often. I would like to suggest that the icy collection will only need between .5% and 2% of the collection back over a one year period. So for a 10 PB collection the most used over a year would be 200 TB.
Cold Collection
I would like to suggest that the cold collection will only need between 2% and 5% of the collection back over a one year period. So for a 10 PB collection the most used over a year would be 500 TB.
Chilly Collection
The cold collection will only need between 5% and 10% of the collection back over a one year period. So for a 10 PB collection the most used over a year would be 1 PB.
People might or might not disagree with these numbers and names – but it is a start point.