CS61C Fall 2012 HW 1

TA: Sung Roa Yoon

Due Sunday, September 2, 2012 @ 11:59pm

Goals

This assignment is designed to get you back into the swing of things, thinking about Warehouse Scale Computing, and to help you make sure your account works so that you can submit assignments.

Submitting Your Solution

Submit your solution by creating a directory named "hw1" inside your working repository. In this directory, create a file named "hw1.txt" and a file named cs61c-XX.jpg (see problem 0) and put your answers inside. It is important that you place your submission for hw1 inside this directory and not somewhere else, as when we pull submissions we will look for your submission there. Then run
git add -A
git commit -m "hw1 submission"
git tag -f hw1
git push --tags origin master


The submission program works now!

In order to avoid blowing through our disk quota we are requiring that the cs61c-XX.jpg's submitted for problem 0 be of size less than approximately 250 kB.

Problem 0: Photograph

Acquire a jpeg of yourself in which your face is clearly visible. Submit this file as cs61c-XX.jpg, where cs61c-XX is your account login (these will be distributed in lab). Additionally, we request that the image you submit not be much larger than 250 kB in size. Note also that your submission can go through without this file, so make sure that you include it in your submission.

Problem 1: Basics Potpourri

  1. Match these words to their definitions (each is used exactly once): desktop computers, servers, low-end servers, supercomputers, terabyte, petabyte, datacenters, embedded computers, multicore processors, RAM, CPU, operating system, compiler, bit, instruction, assembly language, machine language, C, assembler, high-level language, system software, application software.
    1. Computer used to run large problems and usually accessed via a network
    2. 2^50 bytes
    3. Computers composed of hundreds to thousands of processors and terabytes of memory
    4. Part of a computer called central processor unit
    5. A kind of memory called random access memory
    6. Desktop computer without screen or keyboard usually accessed via a network
    7. A microprocessor containing several processors in the same chip
    8. Thousands of processors forming a large cluster
    9. Currently the largest class of computer that runs one application or one set of related applications
    10. Program that translates statements in high-level language to assembly language
    11. Personal computer delivering good performance to single users at low cost
    12. Program that translates symbolic instructions to binary instructions
    13. Commands that the processors understand
    14. Binary language that the processor can understand
    15. Symbolic representation of machine instructions
    16. Interface between user's program and hardware providing a variety of services and supervision functions
    17. Software/programs developed by the users
    18. Software layer between the application software and the hardware that includes the operating system and the compilers
    19. High-level language used to write application and system software
    20. Portable language composed of words and algebraic expressions that must be translated into assembly language before run in a computer
    21. 2^40 bytes
    22. Binary digit (0 or 1)
  2. Administrative Questions (Refer to the first lecture if you can't answer them)
    1. What are the 6 Great Ideas in Computer Architecture?
    2. Which projects are the only projects you can have partners in?
    3. How many slip days are you given for this class?
    4. How much penalty do you receive for late projects?
    5. How much penalty do you receive for late homeworks?
  3. Warehouse Scale Computers
    1. Why can Power Usage Effectiveness (PUE) never go below 1?
    2. If the building for the WSC uses total of 5 Megawatt and PUE equals 2, how much power are the servers + networking using?

Problem 2: MapReduce

For the parts that ask you to design an algorithm, feel free to use pseudocode, explaining at a high level what the algorithm is.

  1. Give an example of a problem that, while in some sense parallellizable, is not a good fit for MapReduce. Justify your answer.

  2. Assume you have a list of key-value pairs of the form (document, text), associating a body of text with the name of a document. Give the format of intermediate key-value pairs, and the pseudocode for the mapper and reducer, that you could use to calculate for each word, how many times it was used across all documents.

  3. A chess association was curious whether there are any correlation between specific board positions and the probability of a particular side winning, and also if it is affected by the players' skill level. Therefore, the association gave you a list of key value pairs, in which the keys represent the board positions and the values represent the skill tier of the two players and whether the white side won that match or not. The association wants you to build a MapReducer that has the output of the win rate of the white side, depending on the board state and the tier of the players' skill. Explain in detail how you would do it, or write a pseudocode.

    The winrate represents the total number of wins over the total number of games played under the specific tier and board position.

    The board position is an 64 element string (denoting the board left to right bottom to top as seen from the white side) whose values take on a combination of (white W or black B) and (pawn P, knight N, bishop B, rook R, queen Q, king K) or have 0 for empty tiles. For example, the starting state (image obtained from wikipedia) for a standard chess game would be:
    { WR WN WB WQ WK WB WN WR WP WP WP ... 0 0 0 0 BP BP BP BP ... BB BN BR }

    Here are some examples of the inputs and desired outputs, following the format of {key, value}:
    Inputs:
    { boardstate1, (amateur, white-won) }
    { boardstate2, (professional, black-won) }
    { boardstate3, (amateur, white-won) }
    { boardstate4, (master, black-won) }
    { boardstate1, (grandmaster, white-won) }
    { boardstate5, (amateur, black-won) }
    { boardstate20, (professional, white-won) }
    etc...
    Outputs:
    { (boardstate1, amateur), .5}
    { (boardstate2, master), 1}
    { (boardstate3, amateur), .9}
    { (boardstate4, grandmaster), .4}
    { (boardstate5, amateur), .2}
    { (boardstate20, professional), .1}
    etc...