CS61C Fall 2014 Project 3 Competition: Performance Optimization

TAs: Andrew Luo, David Adams, Fred Hong
Part 1: Due 11/23 @ 23:59:59

Updates

Clarifications/Reminders

Background

Refer to the Project 3 Part 1 website, Project 3 Part 2 website, and the Project 1 website.

Architecture

Your code will be tested on the hive machines, so keep that in mind while choosing the values to use for certain optimizations. The following is detailed information about the hive workstations; you may or may not find it useful.

They are Dell Precision T5500 machines in a Xeon DP configuration on an Intel 5520 chipset. They are packing not one, but two Intel Xeon E5620 CPUs. Each has 4 cores, for a total of 8 processors, and each core runs at a clock rate of 2.40 GHz. This system also features 6GB of 1333MHz DDR3 ECC Registered memory, in a 6x1GB hex-channel configuration (note that the Intel 5520 is NUMA so 3x1GB is local to one physical processor and the other 3x1GB is local to the second physical processor).

Each core comes with 16 general purpose integer registers (though a few are reserved such as the stack pointer) and 8 floating point registers. They also have 16 XMM registers per core for use with the SIMD intrinsics.

All caches deal with block sizes of 64 bytes. Each core has an L1 instruction and L1 data cache, both of 32 Kibibytes. A core also has a unified L2 cache (same cache for instructions and data) of 256 Kibibytes. The 4 cores on a microprocessor share an L3 cache of 8 Mibibytes. The associativities for the L1 instruction, L1 data, unified L2, and shared L3 caches are 4-way, 8-way, 8-way, and 16-way set associative, respectively.

Competition (Due 12/04 @ 11:59:59 AM aka NOON, not midnight)

Optimize your depth map generator using any techniques you know. As usual, we have provided the same utilities for your to check the correctness of your code. Please remember, however, that the code provided is not a guarantee of correctness and we expect you to test your code yourself. Code that is incorrect will be disqualified from the competition.

The submissions with the highest performance will receive extra credit (including EPA) and winners will be annnounced in class. We will be running your code on a variety of images and features of various different sizes, which are not guaranteed to be similar to those in benchmark.c

Getting started

Copy the files in the directory ~cs61c/proj/03/competition/ to your proj3/competition directory, by entering the following command:

mkdir -p ~/proj3/competition
cp -r ~cs61c/proj/03/competition/* ~/proj3/competition
cp -r ~/proj3/part2/calcDepthOptimized.c ~/proj3/competition

The only file you need to modify and submit is calcDepthOptimized.c.

You should copy your code from part 2 and replace the provided calcDepthOptimized.c using the commands above.

The rest of the files are part of the framework. It may be helpful to look at and modify some of the other files to more thoroughly test your code. A description of these files is provided below:

Test your changes by compiling your files with make and running the benchmark or check executable.

make
./check
./benchmark

Submission

Competition submissions are due Thursday 12/04 @ 11:59:59 AM (noon). Late submissions will not be accepted, and please make sure only ONE partner submits. To submit proj3-competition, enter in the following:

submit proj3-competition