Archives and Big Data

By Henry Newman

My good friend Rich Brueckner over at InsideHPC posted a talk I did at the IDC HPC User Forum on why archives are important to big data analysis.

I believe I made the case that organizations are going to need to keep far more archive data than they think they will, and they are going to need to keep the raw data, not just the processed data. Organizations are going to have to plan for this data with the right budgets and the right people to manage it because this data is important to their future.

As the saying goes, we do not know what we do not know about the data we have. And we are going to have to go back to the original data and reprocess it to extract new information. This has been done for decades in the oil and gas industry, as new algorithms are developed to better understand where to find new oil and gas. It has also been done for decades for genetics and medical data.

I make the case in the talk that it will be critical for all kinds of businesses to keep their data to themselves and not outsource the data to a cloud provider, given network performance and the requirement to get the database to extract new information. I think this will be needed in all types of industries from retail to sciences to manufacturing.

Planning for archives will become more important in the future, not less important, as archives are going to be critical to the future of many, many industries from retail to medical to you name it.

I hope you enjoy the video.

This article was originally published on June 27, 2013