I have been asked to attend a discussion group on "Big Data" to discuss Big Data--Architectures for Storage and Distribution, and Big Data--Analytic tools to extract value from the data. What is missing from most discussions on "Big Data" is a defintion.
Think along the lines of taking the output from lots of telescopes where the number of files might be small, but each file is very large. There is generally not a huge amount of metadata associated with each file, and searching and indexing is not that difficult, given the size of the metadata. On the other end of the spectrum is, say, transactions from credit cards and applications, such as fraud detection. The number of transactions has an unimaginable volume. Two totally different types of data, one single name for it--Big Data. I think our industry must come up with some agreed upon terms for various types of Big Data, and the associated metadata.
Not all big data is the same, and it should not be treated the same. Some data requires significant metadata processing, given the size of the metadata and the relationships that must be built and rebuilt. Of course, some other data types do not have these requirements. Some data you do not mind losing an entry or two, while other data, such as an MR scan of a cancer patient from his visit six months ago, is a big deal to lose.
I think we should classify big data by these areas and likely a few others that I have not thought about at 5AM Saturday.
Labels: Storage,big data,data storage management
posted by: Henry Newman