IBM just released a new product that provides high IOPS support for data analysis in clouds. This is far different than the approach – proposed by many – of using flash to store all data, which makes no sense given the latency over the Internet compared with the latency of disk drives and flash (flash gives you at best likely a 50% improvement in latency reduction).
Using flash for data analytics makes complete sense, especially for anyone using the MapReduce algorithm either within Hadoop or with other commercial products that use MapReduce. Flash is going to be very important for high speed data ingest type problems like collecting security logs from thousands of connections with deep packet inspection, or point of sale information for a very large retailer, or failure analysis in a large network of systems. I think this is a great idea but someone forgot one big issue.
If you have high speed ingest how are you going to get the data to the cloud for analysis? Flash is important for analytics in addressing a number of types of problems in my opinion:
1. High speed ingest problems where the processors need higher performance given the incoming data rate.
2. Where you are doing lots of different experiments and have many different uses that move in and out with data.
3. Where the ratio of ingest, correlation and shuffle (distribution of the correlated data to the nodes in the cluster for searching) is far greater than the amount of searches done over time.
My view is if you do not have one or more of these issues then flash is not going to make much a difference in the cloud. And from cost point of view, it is far too expensive compared to disk. Amazon and others I am sure are doing this, but we shall see if the market is big enough for the costs of using flash.
Photo courtesy of Shutterstock.