A friend of mine over at InsideHPC suggested I read this book The Human Face of Big Data. (The link goes to 10-minute video based on the book – definitely worth your time.)
It’s interesting how author Rick Smolan discusses big data analysis and collection. We in the computer science field think about algorithms and what algorithm will need to be applied to what type of data to get the answer we are looking for. Smolan takes it up a few levels. He looks at what the impact of getting the answers will be and how the data is going to be collected via the billions of people on the planet connected via the cellular networks.
One of the most interesting points is the amount of data that is going to be collected. I am sure that this will please the storage companies, but it leaves me with the sinking feeling that the storage demands are not going to keep up with the storage requirements for the analysis that is going to be needed.
As I’ve said before, we need to save data as we do not know everything about that data, and quite possibly in the future we will be able to extract new information about data that has been collected. This is true whether that data is genetic data, climate data or something like seismic traces to find oil.
All of these are excellent examples of data that is archived and people have found new information from older data that – if it had not been archived – would likely have been lost or have a high cost to duplicate.
The tradeoff will be: what is the cost of storing the data as compared to recollecting it. I have ordered the book?