













### **General Definitions**

- Latency: time to completely execute a certain task
  - for example, time to read a sector from disk is disk access time or disk latency
- Throughput: amount of work that can be done over a period of time







# Steps in Executing MIPS 1) IFtch: Instruction Fetch, Increment PC 2) Dcd: Instruction Decode, Read Registers 3) Exec: Mem-ref: Calculate Address Arith-log: Perform Operation 4) Mem: Load: Read Data from Memory Store: Write Data to Memory 5) WB: Write Data Back to Register







# **Example**

- Suppose 2 ns for memory access, 2 ns for ALU operation, and 1 ns for register file read or write; compute instr rate
- Nonpipelined Execution:
  - •1w: IF + Read Reg + ALU + Memory + Write Reg = 2 + 1 + 2 + 2 + 1 = 8 ns
  - •add: IF + Read Reg + ALU + Write Reg = 2 + 1 + 2 + 1 = 6 ns (recall 8ns for single-cycle processor)
- Pipelined Execution:
  - Max(IF,Read Reg,ALU,Memory,Write Reg) = 2 ns



S1C L29 CPU Design : Pipelining to Improve Performance (15)

Garcia, Spring 2007 © I



# **Administrivia**

- Want to redo your autograded assignments for more credit?
  - · We may have an opportunity for you...
- Performance Competition Up!



- Rewrite HW2 to be as fast as possible
- · It'll be run on real MIPS machine (PS2)
- You can optimize C or MIPS or BOTH!!
- Do it for pride, fame (& EPA points)
- · Two competitions
  - Traditional (same spec as H2)
  - Unbounded (same H2 Extra for Experts spec)



Garcia, Spring 2007 © UC

# **Problems for Pipelining CPUs**

- Limits to pipelining: <u>Hazards</u> prevent next instruction from executing during its designated clock cycle
  - <u>Structural hazards</u>: HW cannot support some combination of instructions (single person to fold and put clothes away)
  - Control hazards: Pipelining of branches causes later instruction fetches to wait for the result of the branch
  - <u>Data hazards</u>: Instruction depends on result of prior instruction still in the pipeline (missing sock)
- These might result in pipeline stalls or "bubbles" in the pipeline.

CS61C L29 CPU Design : Pipelining to Improve Performance (18)

Garcia. Spring 2007 © UC



# Structural Hazard #1: Single Memory (2/2)

### • Solution:

- infeasible and inefficient to create second memory
- (We'll learn about this more next week)
- so simulate this by having two Level 1
   Caches (a temporary smaller [of usually most recently used] copy of memory)
- have both an L1 <u>Instruction Cache</u> and an L1 <u>Data Cache</u>
- need more complex hardware to control when both caches miss



CS61C L29 CPU Design : Pipelining to Improve Performance (20)

Garcia Spring 2007 @ HC



# Structural Hazard #2: Registers (2/2)

- Two different solutions have been used:
  - 1) RegFile access is *VERY* fast: takes less than half the time of ALU stage
    - Write to Registers during first half of each clock cycle
    - Read from Registers during second half of each clock cycle
  - 2) Build RegFile with independent read and write ports
- Result: can perform Read and Write during same clock cycle

CS61C L29 CPU Design : Pipelining to Improve Performance (22)

Garcia, Spring 2007 © U

### **Peer Instruction**

- A. Thanks to pipelining, I have <u>reduced the time</u> it took me to wash my shirt.
- B. Longer pipelines are <u>always a win</u> (since less work per stage & a faster clock).
- C. We can <u>rely on compilers</u> to help us avoid data hazards by reordering instrs.

0: FFF 1: FFT 2: FTF 3: FTT 4: TFF 5: TFT 6: TTF 7: TTT

ABC

Garcia, Spring 2007 © UCB

# **Things to Remember**

# Optimal Pipeline

- Each stage is executing part of an instruction each clock cycle.
- One instruction finishes during each clock cycle.
- On average, execute far more quickly.

# • What makes this work?

- Similarities between instructions allow us to use same stages for all instructions (generally).
- Each stage takes about the same amount of time as all others: little wasted time.



CS61C L29 CPU Design : Pipelining to Improve Performance (25)

Garcia. Spring 2007 © UC