University of California Berkeley
College of Engineering
Department of Electrical Engineering and Computer Science

CS61C, Summer 2006

HW 7

Submit your solution online by 11:59pm on August 17. Do this by creating a directory named hw7 that contains a text file named hw7.txt . From within that directory, type "submit hw7".

This Assignment is Optional

Your grade for this assignment will only be factored into your semester grade if it helps you. If you don't feel that you need this grade, you don't need to do this assignment.


1. A computer architect needs to design the pipeline of a new microprocessor. She has an example workload program core with 10^6 (1,000,000) instructions. Each instruction takes 100 ps to finish.

a) How long does it take to execute this program core on a nonpipelined processor?

b) The current state-of-the-art microprocessor has about 20 pipeline stages. Assume it is perfectly pipelined. How much speedup will it achieve compared to the nonpipelined processor?

c) Real pipelining isn't perfect, since implementing pipelining introduces some overhead per pipeline stage. Will this overhead affect instruction latency, instruction throughput, or both?

2. Identify all of the data dependencies in the following code. Which dependencies are data hazards that will be resolved via forwarding? Which dependencies are data hazards that will cause a stall?

add $3, $4, $2
sub $5, $3, $1
lw $6, 200($3)
add $7, $3, $6

3. The following C program is run (with no optimizations) on a processor with a cache that has eight-word (32-byte) blocks and holds 256 bytes of data:

int i, j, c, stride, array[512];
for (i = 0; i < 10000; i++)
for (j = 0; j < 512; j = j + stride)
c = array[j] + 17;

If we consider only the cache activity generated by references to the array and we assume tha tintegers are words, what is the expected miss rate when the cache is direct mapped and stride = 256? How about if stride = 255? Would either of these change if the cache were two-way set associative?

4. Consider a virtual memory system with the following properties:

40-bit virtual byte address
16 KB pages
36-bit physical byte address

What is the total size of the page table for each process on this processor, assuming that the valid, protection, dirty, and use bits take a total of 4 bits and that all the virtual pages are in use? (Assume that disk addresses are not stored in the page table.)

Problems adapted from P&H.