







































| Pee       | r Instruction (1/2)                                                                                                |                       |
|-----------|--------------------------------------------------------------------------------------------------------------------|-----------------------|
| Assume    | • 1 instr/clock, delayed branch, 5 stage                                                                           | 1                     |
| pipeline  | , forwarding, interlock on unresolved                                                                              | 2                     |
| load haz  | zards (after 10 <sup>3</sup> loops, so pipeline full)                                                              | 3                     |
| Loop:     | <pre>lw \$t0, 0(\$s1) addu \$t0, \$t0, \$s2 sw \$t0, 0(\$s1) addiu \$s1, \$s1, -4 bne \$s1, \$zero, Loop nop</pre> | 4<br>5<br>6<br>7<br>8 |
| •How ma   | any pipeline stages (clock cycles) per                                                                             | 9                     |
| loop iter | ation to execute this code?                                                                                        | 10                    |

| Peer In                                                                                                  | struction (2/2)                                                                                                                                                                                          |                            |
|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------|
| Assume 1 in<br>pipeline, for<br>load hazard<br>Rewrite this<br>(clock cycle                              | nstr/clock, delayed branch, 5 stage<br>warding, interlock on unresolved<br>s (after 10 <sup>3</sup> loops, so pipeline full).<br>c code to reduce pipeline stages<br>(s) per loop to as few as possible. | 1<br>2<br>3                |
| a<br>s<br>a<br>b                                                                                         | w \$t0, 0(\$s1)<br>ddu \$t0, \$t0, \$s2<br>w \$t0, 0(\$s1)<br>ddiu \$s1, \$s1, -4<br>ne \$s1, \$zero, Loop<br>op                                                                                         | 4<br>5<br>6<br>7<br>8<br>9 |
| <ul> <li>How many pipeline stages (clock cycles) per<br/>loop iteration to execute this code?</li> </ul> |                                                                                                                                                                                                          |                            |



- Delayed branch helps with control hazard in 5 stage pipeline
- Load delay slot / interlock necessary
- More aggressive performance:
  - Superscalar

Out-of-order execution

## • Assignments • HW7 due 8/2 • Proj3 due 8/5

Midterm Regrades due Today

**Administrivia** 

- Logisim in lab is now 2.1.6 ·java -jar ~cs6lc/bin/logisim
- Valerie's OH on Thursday moved to 10-11 for this week























• Memory Hierarchy presents the processor with the illusion of a very large very fast memory.