CS61C Fall 2013 HW 1

TA: Sung Roa Yoon, Kelvin Chou

Due Sunday, September 8, 2013 @ 11:59pm

Goals

This assignment is designed to get you back into the swing of things and get you thinking about Warehouse Scale Computing and MapReduce.

Obtaining the Assignment

Copy the hw1 files into your homework directory. Go to your hw directory, type git init if it's not in a git repository, then type:

$ git pull ~cs61c/hw/01 master

If you are unfamiliar with git and having issues with this, please look at lab 1 first!

If you don't have a class account, please access the hw files from here hw1.txt, hw1.c, and then email Sagar to submit!

Problem 1: Warehouse Scale Computers - True or False

Please choose either true or false, and explain why you chose so.

  1. The latencies between nodes within a large SMP server is much smaller than a low-end PC-class server.
  2. A large SMP server is more cost efficient than a low end PC-class server.
  3. A Power Usage Effectiveness value for a WSC can never be below 1.
  4. Most of the power usage within a WSC goes towards the IT Equipment.
  5. The power efficiency of a server scales linearly with load on the server.

Problem 2: MapReduce Questions

  1. What is a combiner? (Yes you will probably need to look online for these problems)
  2. Why must the input key value pairs be the same types as the output key value pairs in the combiner?
  3. Can combiners start working before all the Map works finish?
  4. Is google search engine something good for map reduce to be used with?
  5. Is there any data shared within a single phase (map phase, reduce phase, etc) between the workers in MapReduce? Explain why or why not.

Problem 3: MapReduce Programming

Your boss gives you a task in which you have to see the correlation of certain words to spam emails.
Your input key is a boolean on whether the mail was spam or not, and the input value is the content of the texts of the email.
Explain or write the pseudocode for an efficient way to make and output a good correlation. The correlation output can be whatever you deem as the most critical, as long as you have a reasonable explanation for why your output is valid. You should explain each parts, map, reduce, and combiner.

  1. What would be a good way to represent the correlation between the words and the spam emails?
  2. What would happen in the map part of the code, and what would the key and value of the output of map be?
  3. What can be written in the combiner part of the code to speed the program?
  4. How would you finalize the code in the reduce and what would the key and value of the output of reduce be?

Problem 4: C Programming

The goal of this short C programming practice is to get you familiar with writing C and compiling it to test it yourself. Open hw1.c and fill out the function reverseInt.

The exercise is to fill in the function reverseInt, where you are given an integer and you want to reverse the digits within the integer.

Submission Guideline

Please enter all your text answers in the hw1.txt, and make sure to leave the text file in the same directory as your hw1.c . Once you finished your assignment, go to that directory and type:

submit hw1

You should not need to attach anything extra, at least not for this particular assignment.