By Henry Newman

My previous blog entry talked about the need for new types of people who can understand both algorithms and data layout to optimize the generation of actionable information for business and industry. The question I have is: Are schools teaching the current and next generation the right kinds of information so they can develop these skills for the job market?

The data analysis skills with MapReduce, graph analysis, statistics and the like are difficult enough. Add in the data layout for the information to be processed, which requires an understanding of the application reading or writing the data, the operating system, kernel, drivers RAID controllers and the file system, and you have a pretty complex eco-system, one in which there are not many people who fully understand the relationships or how to evaluate the relationships. Something is going to have to give to make it easier.

The move to file system appliances across a broad spectrum of the market from my home PC with an external ISCSI 4 disk RAID, to large parallel file systems moving into the appliance market gives me hope that people are working on solving this problem. These file system scaling and data layout problems are not going to be solved overnight. Most file systems are still stuck looking at files and have little understanding of what the layout must have for those files to be processed into information. I think it will take a very special person to be able to understand all that is needed in the current environment.

In the meantime, the best thing organizations can do is develop teams with the right domain expertise. Having a new data analysis graduate working with someone who knows datapath is likely the best anyone can do. Of course, there will be a handle of people that understand both. I hope I can hire one.

This article was originally published on April 27, 2012