CS61C Lab 11

Pipelining

Exercises

Info

The following exercises are based on the following code segment:

addi $t0 $t1 100   # Line 1
lw   $t2 4($t0)    # Line 2
add  $t3 $t1 $t2   # Line 3
sw   $t3 8($t0)    # Line 4
lw   $t5 0($t6)    # Line 5
or   $t5 $t0 $t3   # Line 6

Exercise 1

Given the perfectly pipelined representation below (F = "Instruction Fetch", D="Decode", A="Execute", M="Memory Access", W="Write Back")

addi $t0 $t1 100   F D A M W
lw   $t2 4($t0)      F D A M W
add  $t3 $t1 $t2       F D A M W
sw   $t3 8($t0)          F D A M W
lw   $t5 0($t6)            F D A M W
or   $t5 $t0 $t3             F D A M W

Draw directed arrows to indicate data dependencies between stages. The arrow should start at the stage where the data is available and end at the stage where the data is needed. (You can either print this page out and edit our ASCII picture above or draw your own with more room.)

Exercise 2

Based on your table above, you might notice that some arrows are "forward" arrows, indicating the values are needed after they are available, which is what is intended. Some arrows are not -- use these arrows to fill in the table below:

For each instruction, list the instructions which are read-after-write dependant despite forwarding (which cause stalls) on that instruction.

________ is dependant on instruction 1

________ is dependant on instruction 2

________ is dependant on instruction 3

________ is dependant on instruction 4

________ is dependant on instruction 5

Exercise 3

Rewrite the table from exercise 1 to include stalls. "Stalls" means that nop instructions will be inserted between relevant instructions, and no gap between the stages will be introduced.

Exercise 4

How many clock cycles does it take to finish executing all six instructions on a CPU with forwarding? How long for a CPU without forwarding?

Exercis 5

Now reorder the instructions to get rid of the backward arrows from exercise 1, while still keeping the final result of running the MIPS code the same (All the registers should contain the same values when running the reordered MIPS code as compared to running the unmodified MIPS code). How many clock cycles does it take now to run the code segment, assuming a CPU with forwarding?