University of California at Berkeley Department of Electrical Engineering & Computer Sciences Instructional Support Group /share/b/pub/mapreduce.help Oct 26, 2007 MapReduce --------- MapReduce is a programming model for processing and generating large data sets. See http://labs.google.com/papers/mapreduce.html for information. MapReduce is typically run on a parallel computing cluster using a framework such a HaDoop to manage the distributed computing platform. See http://lucene.apache.org/hadoop/ for more information. EECS Instruction has received grants from Google and Intel for the creation of a 26-node computing cluster, which will be available to EECS classes in Spring 2008. The cluster is called the "Icluster". We are installing HaDoop and MapReduce there. MapReduce on the ICluster ------------------------- The Instructional "Icluster" is still under development (Oct 2007). Some users are developing programs for classes now. Information about logging onto the cluster and running programs will be added here later. Here is a simple test of the map-reduce implementation on the Icluster: HADOOP_HOME=/home/aa/projects/hadoop HADOOP_INSTALL=$HADOOP_HOME/hadoop HADOOP_CONF_DIR=$HADOOP_HOME/hadoop-conf PATH=$HADOOP_INSTALL/bin:$PATH hadoop dfs -copyFromLocal $HADOOP_HOME/sample/gutenberg gutenberg hadoop fs -rmr gutenberg-output hadoop jar $HADOOP_INSTALL/hadoop-0.14.1-examples.jar wordcount \ gutenberg gutenberg-output hadoop dfs -ls gutenberg-output hadoop dfs -cat gutenberg-output/part-00000 References ---------- http://code.google.com/edu/content/submissions/mapreduce-minilecture/listing.html Googling for "hadoop run map-reduce" reveals programming examples. EECS Instructional Support 378/384/386 Cory, 333 Soda inst@eecs.berkeley.edu