# <u>EECS150 - Digital Design</u> <u>Lecture 08 - Project Introduction</u> <u>Part 1</u>

# Feb 9, 2012 John Wawrzynek

# **Project Overview**

- A. Pipelined CPU review
- B. MIPS150 pipeline structure
- C. Serial Interface
- D. Later:
  - A.Memories, project memories and FPGAs
  - B. Video subsystem
  - C. Project specification and grading standard

# **MIPS 5-stage Pipeline Review**



# MIPS 5-stage Pipeline

### Control Hazard Example cycle EX DM beq \$1, \$2, L1 IF ID WB add \$5, \$3, \$4 EX DM WB IF ID L1: sub \$5, \$3, \$4 EX DM WB IF ID but needed here! branch address ready here Register values are known here, move branch compare and target address generation to here.

Still one remaining cycle of branch delay. "Architected branch delay slot" on MIPS allows compiler to deal with the delay. Other processors without architected branch-delay slot use branch predictors or pipeline stalling.

# MIPS 5-stage Pipeline

### Data Hazard Example



Logic must be added to detect when such a hazard exists and control multiplexors to forward correct value to ALU. No alternative except to stall pipeline (thus hurting performance).

# MIPS 5-stage Pipeline

### Load Hazard Example



"Architected load delay slot" on MIPS allows compiler to deal with the delay. Note, regfile still needs to be bypassed.

No other alternative except for stalling.

# **Processor Pipelining**

### Deeper pipeline example.

| IF | F1 | IF2 | ID  | X1 | X2 | M1 | M2 | WB |    |  |
|----|----|-----|-----|----|----|----|----|----|----|--|
|    |    | IF1 | IF2 | ID | X1 | X2 | M1 | M2 | WB |  |

Deeper pipelines => less logic per stage => high clock rate. But

Deeper pipelines\* => more hazards => more cost and/or higher CPI.

Cycles per instruction might go up because of unresolvable hazards.

### Remember, Performance = # instructions X Frequency<sub>clk</sub> / CPI

\*Many designs included pipelines as long as 7, 10 and even 20 stages (like in the <u>Intel Pentium 4</u>). The later "Prescott" and "Cedar Mill" Pentium 4 cores (and their <u>Pentium D</u> derivatives) had a 31-stage pipeline.

### How about shorter pipelines ... Less cost, less performance

Spring 2012

EECS150 - LecO8-proj1

# **MIPS150** Pipeline

The blocks in the datapath with the greatest delay are: IMEM, ALU, and DMEM. Allocate one pipeline stage to each:

|                                                                                           | I                            | x                                                                               | М                                                                    |            |
|-------------------------------------------------------------------------------------------|------------------------------|---------------------------------------------------------------------------------|----------------------------------------------------------------------|------------|
| Use PC register<br>to IMEM and r<br>instruction. Inst<br>stored in a pipel<br>also called | etrieve next<br>ruction gets | Use ALU to<br>compute result,<br>memory address,<br>or compare<br>registers for | Access data r<br>device for loa<br>Allow for setu<br>register file w | p time for |

Most details you will need to work out for yourself. Some details to follow ... In particular, let's look at hazards.

branch.

Spring 2012

register", in this case.

# MIPS 3-stage Pipeline



Therefore no extra logic is required.

# MIPS 3-stage Pipeline

### Load Hazard



"Architected load delay slot" on MIPS allows compiler to deal with the delay. No regfile bypassing needed here assuming regfile "write before read".

# MIPS 3-stage Pipeline

# add \$5, \$3, \$4 I X M add \$7, \$6, \$5 I X M reg 5 value needed here! reg 5 value updated here

## Ways to fix:

- 1. Stall the pipeline behind first add to wait for result to appear in register file. NOT ALLOWED this semester.
- 2. Selectively forward ALU result back to input of ALU.
  - Need to add mux at input to ALU, add control logic to sense when to activate. A bit complex to design. Check book for details.

# **Project CPU Pipelining Summary**

| 3-stage  | I           | Х       | м           |  |
|----------|-------------|---------|-------------|--|
| pipeline | instruction | execute | access data |  |
|          | fetch       |         | memory      |  |

- Pipeline rules:
  - Writes/reads to/from DMem use leading edge of "M"
  - Writes to RegFile use trailing edge of "M"
  - Instruction Decode and Register File access is up to you.
- 1 Load Delay Slot, 1 Branch Delay Slot
  - No Stalling may be used to accommodate pipeline hazards (in final version).
- Other:
  - Target frequency to be announced later (50-100MHz)
  - Minimize cost
  - Posedge clocking only

# **Background for Lab #5**

# Final Project: Spring 2011



- Executes most commonly used MIPS instructions.
- Pipelined (high performance) implementation.
- Serial console interface for shell interaction, debugging.
- Ethernet interface for high-speed file transfer.
- Video interface for display with 2-D vector graphics acceleration.
- Supported by a C language compiler.

Spring 2012

EECS150 - LecO8-proj1

# **Board-level Physical Serial Port**

**DB-9** connector

RS-232 Transmitter/Receiver

+3.3V INPUT





Implements standard signaling voltage levels for serial communication. Allows FPGA board to communicate with any other RS-232 device.



Page 15

# **FPGA Serial Port**



# <u>MIPS uses Memory Mapped I/O</u>

- Certain addresses are not regular memory
- Instead, they correspond to registers in I/O devices



# **Processor Checks Status before Acting**

- Path to device generally has 2 registers:
  - <u>Control Register</u>, says it's OK to read/write (I/O ready) [think of a flagman on a road]
  - <u>Data Register</u>, holds data for transfer
- Processor reads from Control Register in loop, waiting for device to set Ready bit in Control reg (0  $\Rightarrow$  1) to say its OK
- Processor then loads from (input) or writes to (output) data register

# **MIPS150 Serial Line Interface**

- Serial-Line Interface is a memory-mapped device.
- Modeled after SPIM terminal/keyboard interface.
  - Read from keyboard (<u>receiver</u>); 2 device regs
  - Writes to terminal (<u>transmitter</u>); 2 device regs



# <u>Serial I/O</u>

- Control register rightmost bit (0): Ready
  - Receiver: Ready==1 means character in Data Register not yet been read;
    - $1 \Rightarrow 0$  when data is read from Data Reg
  - Transmitter: Ready==1 means transmitter is ready to accept a new character;
    - $0 \Rightarrow$  Transmitter still busy writing last char
      - I.E. bit (not used in our implementation)
- Data register rightmost byte has data
  - Receiver: last char from serial port; rest = 0
  - Transmitter: when write rightmost byte, writes goes to serial port.

# "Polling" MIPS code

• Input: Read from keyboard into \$v0

|            | lui  | <pre>\$t0, 0xffff #fff0000</pre>  |
|------------|------|-----------------------------------|
| Waitloop1: | lw   | <pre>\$t1, 0(\$t0) #control</pre> |
|            | andi | \$t1,\$t1,0x1                     |
|            | beq  | <pre>\$t1,\$zero, Waitloop1</pre> |
|            | lw   | \$v0, 4(\$t0) #data               |

Output: Write to display from \$a0

|            | lui  | <pre>\$t0, 0xffff #ffff0000</pre> |
|------------|------|-----------------------------------|
| Waitloop2: | lw   | \$t1, <u>8</u> (\$t0) #control    |
|            | andi | \$t1,\$t1,0x1                     |
|            | beq  | <pre>\$t1,\$zero, Waitloop2</pre> |
|            | SW   | <u>\$a0, 12</u> (\$t0) #data      |