We have all heard of the MapReduce algorithm and its implementation in Hadoop, which is an open source search engine framework (utilities, file system and parallel search/index -- MapReduce). In my real job, my staff and I have been doing a lot of Hadoop storage architecture work, looking at things like performance and reliability. One of the big issues that always comes up when working in this area is the amount of power being used. For most Hadoop environments, the most significant power component is not storage, but CPUs and memory. This got me thinking about CPU architecture and design. The issue for Hadoop is really operations per watt, since this is a very parallel application that can run across many thousand of cores. What I realized is that the MapReduce data processing algorithm often requires little to no floating point computation because all it is doing is indexing the information so you can search it, and most data is not floating point. The question I started to ponder is how much power is needed on modern CPUs to power the floating point units. My guess is that a pretty significant portion of the power budget for a chip is floating point.
I know a number of organizations looking at using low-power processors, such as the Intel Atom line, to address the MapReduce problem. Low-power processors with reduced floating point might become an option for MapReduce applications. This still leaves the storage performance, reliability and power problem. But those are thoughts for another flight, as that is where I do most of my writing.
Labels: Hadoop,Storage,floating point,energy consumption
posted by: Henry Newman