Here are some more thoughts about MapReduce, this time on storage. Let's say you have 128 nodes, with one disk on each node for a total of 128 disks. The bandwidth to each disk is likely over 100 MB/sec, based on specifications sheets from various disk vendors. We therefore have 128*100 MB/sec, which is 12.5 GB/sec, which exceeds the performance of just about all RAID controllers.
Add to the problem that you have the overhead of drives being in RAIDed groups, which reduces the theoretical number of seeks that can be done along with the latency between the servers and the storage, and this becomes likely one of the big reasons that RAID storage is not used. The other big reason, of course, is cost. Each server needs to have a connection to shared storage, which means for large RAID storage today either 10 GbE or Fibre Channel. Even if you have an extra 10 GbE on the motherboard, you are still going to need cables and switch ports, which cost $$.
With RAID, you get reliability, which you do not get a with a single copy of you data. The common way around this is replication of your nodes with the storage. This is not cheap either, both in the cost of the hardware but also the power costs. What someone really needs to do is compare the reliability and cost of replication of cheap nodes with cheap storage versus cheap nodes with expensive storage. I am not sure who would do that in an unbiased way.
Labels: power and cooling,Hadoop,Storage
posted by: Henry Newman