MapReduce
---------

MapReduce is a programming model for processing and generating large data
sets.  See for information.

MapReduce is typically run on a parallel computing cluster using a framework
such a HaDoop to manage the distributed computing platform.  See for more
information.

EECS Instruction has received grants from Google and Intel for the creation
of a 26-node computing cluster, which will be available to EECS classes in
Spring 2008.  The cluster is called the "Icluster".  We are installing HaDoop
and MapReduce there.

MapReduce on the ICluster
-------------------------

The Instructional "Icluster" is still under development (Oct 2007).  Some
users are developing programs for classes now.  Information about logging
onto the cluster and running programs will be added here later.

Here is a simple test of the map-reduce implementation on the Icluster:

  HADOOP_HOME=/home/aa/projects/hadoop
  HADOOP_INSTALL=$HADOOP_HOME/hadoop
  HADOOP_CONF_DIR=$HADOOP_HOME/hadoop-conf
  PATH=$HADOOP_INSTALL/bin:$PATH

  hadoop dfs -copyFromLocal $HADOOP_HOME/sample/gutenberg gutenberg
  hadoop fs -rmr gutenberg-output
  hadoop jar $HADOOP_INSTALL/hadoop-0.14.1-examples.jar wordcount \
      gutenberg gutenberg-output
  hadoop dfs -ls gutenberg-output
  hadoop dfs -cat gutenberg-output/part-00000

References
----------

Googling for "hadoop run map-reduce" reveals programming examples.