

#### inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 29 - CPU Design : Pipelining to Improve Performance II

2010-04-07

Dan Garcia

### IS 3D BAD FOR YOU? MANY HAVE EYESTRAIN!

Cal researcher Marty Banks has put together a system to help with the evestrain many viewers experience with 3D content on a small screen - the vergence / accomodation conflict.

clock cycle

put clothes away)

in the pipeline.

al



www.technologyreview.com/computing/24976

# **Review**

- Pipelining is a BIG idea
- Optimal Pipeline
  - Each stage is executing part of an instruction each clock cycle.
  - One instruction finishes during each clock cycle.
  - On average, execute far more quickly.
- What makes this work?
  - Similarities between instructions allow us to use same stages for all instructions (generally).
  - Each stage takes about the same amount of time as all others: little wasted time.

Cal





Structural Hazard #2: Registers (1/2) Time (clock cycles) L n s 1\$ Reg D\$ รพ t r. Instr 1 I\$ I\$ 0 Instr 2 r Instr 3 d е IŚ D\$ Instr 4 r Can we read and write to registers simultaneously? Cal Spring 2010 © UCB co II (6)

## Structural Hazard #2: Registers (2/2)

- Two different solutions have been used:
  - 1) RegFile access is VERY fast: takes less than half the time of ALU stage
    - Write to Registers during first half of each clock cycle
    - Read from Registers during second half of each clock cycle
  - 2) Build RegFile with independent read and write ports
- Result: can perform Read and Write during same clock cycle



# Control Hazard: Branching (2/9)

Cal

Cal

- We had put branch decision-making hardware in ALU stage
  - therefore two more instructions after the branch will always be fetched, whether or not the branch is taken
- Desired functionality of a branch
  - if we do not take the branch, don't waste any time and continue executing normally
  - if we take the branch, don't execute any instructions after the branch, just go to the desired label















| Data Hazards (1/2)                                                      |                           |  |  |  |  |
|-------------------------------------------------------------------------|---------------------------|--|--|--|--|
| <ul> <li>Consider the following sequence of<br/>instructions</li> </ul> |                           |  |  |  |  |
| add <u>\$t0</u> , \$t1, \$t2                                            |                           |  |  |  |  |
| sub \$t4, <u>\$t0</u> ,\$t3                                             |                           |  |  |  |  |
| and \$t5, <u>\$t0</u> ,\$t6                                             |                           |  |  |  |  |
| or \$t7, <u>\$t0</u> ,\$t8                                              |                           |  |  |  |  |
| xor \$t9, <u>\$t0</u> ,\$t10                                            |                           |  |  |  |  |
| CARC LEP CPU Dadge : Pipelining to Improve Performance 1 (M)            | Garcia, Spring 2010 © UCB |  |  |  |  |































| Peer                                                        | r Instru                                                  | uction                                           | n (2/2)                                                                                                                             |                                 |
|-------------------------------------------------------------|-----------------------------------------------------------|--------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|---------------------------------|
| Assume<br>pipeline,<br>load haza<br>Rewrite ti<br>(clock cy | 1 instr/c<br>forward<br>urds (aft<br>his code<br>cles) pe | lock,<br>ling, ij<br>er 10<br>e to re<br>er loop | delayed branch, 5 stage<br>nterlock on unresolved<br>loops, so pipeline full).<br>educe pipeline stages<br>o to as few as possible. | 1 2                             |
| Loop :                                                      | sw<br>addiu                                               | \$t0,<br>\$t0,<br>\$s1,                          | 0(\$s1)<br>\$t0, \$s2<br>0(\$s1)<br>\$s1, -4<br>\$zero, Loop                                                                        | 1<br>2<br>3<br>4<br>5<br>6<br>7 |
|                                                             | ny pipel<br>ition to                                      |                                                  | tages (clock cycles) per<br>ite this code?                                                                                          | 8<br>9<br>10                    |

