Levels of Representation/Interpretation

<table>
<thead>
<tr>
<th>High Level Language Program (e.g., C)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Compiler</td>
</tr>
<tr>
<td>Assembly Language Program (e.g., MIPS)</td>
</tr>
<tr>
<td>Assembler</td>
</tr>
<tr>
<td>Machine Language Program (MIPS)</td>
</tr>
</tbody>
</table>

Anything can be represented as a number, i.e., data or instructions

```
0000 1001 1100 0110 1010 1111 0101 1000
1010 1111 0101 1000 0000 1001 1100 0111
1100 0110 1010 1111 0101 1000 0000 1011
0000 1001 1100 0110 1010 1111 0101 1000
```

Review

- D-Flip-Flops update output at rising edge of clock
  - Setup and Hold times important
- Critical Path constrains clock rate.
- Finite State Machines extremely useful
- Use muxes to select among input
  - S input bits selects 2S inputs
  - Each input can be n-bits wide, independent of S
- Build n-bit adder out of chained 1-bit adders.

Review: N x 1-bit Adders ⇒ 1 N-bit Adder

Connect Carry Out i-1 to Carry in i:

What about detecting overflow?

- Unsigned overflow - The carry out from the most significant bit.

- Signed overflow - A bit more complicated, two cases:
  - Overflow from adding “big” positive numbers.
  - Overflow from adding “big” negative numbers.

Detecting Signed Overflow (4-bit examples)

- From two positive numbers:
  - 0111 + 0111, 0111 + 0001, 0100 + 0100.
  - What do these have in common? Carry-out from the second highest bit (but not highest bit)
- From two negative numbers:
  - 1000 + 1000, 1000 + 1111, 1011 + 1011.
  - These have carry-out from the highest bit (but not second highest bit)
- Expression for signed overflow: $C_n \text{XOR} C_{n-1}$
Twos Complement Adder/Subtractor

Can subtract by adding the negative of the second number. To negate:

Flip the bits

And add one!

Agenda

- 5 Stages of the Datapath
- Administrivia
- Quick Datapath Walkthrough
- Processor Design Process
  - Determine datapath requirements based on instruction set
  - Select datapath components
  - Assemble the datapath

The Processor

- **Processor** (CPU): Implements the instructions of the Instruction Set Architecture (ISA)
- **Datapath**: portion of the processor which contains hardware necessary to perform operations required by the processor (the brawn)
- **Control**: portion of the processor (also in hardware) which tells the datapath what needs to be done (the brain)

Five Components of a Computer

- Control
- Datapath
- Memory
- Input
- Output

Stages of the Datapath: Overview

- Break up the process of “executing an instruction” into stages or phases, and then connect the phases to create the whole datapath
  - Smaller phases are easier to design
  - Easy to optimize (change) one phase without touching the others
- Project 1 had 3 phases: Fetch, Decode, Execute. Here, we expand Execute into ALU, Memory Access, and Register Write.

Phases of the Datapath (1/5)

- Phase 1: *Instruction Fetch*
  - No matter what the instruction, the 32-bit instruction word must first be fetched from memory (the cache-memory hierarchy)
  - Also, this is where we increment PC (that is, PC = PC + 4, to point to the next instruction: byte addressing so + 4)
Phases of the Datapath (2/5)

- **Phase 2: Instruction Decode**
  - Upon fetching the instruction, we next gather data from the fields (decode all necessary instruction data).
  - Read the opcode and each of the possible fields from the 32 bits of the instruction.
  - Read data from all necessary registers (This differs from Project 1)
    - For add, read two registers
    - For addi, read one register
    - For jal, no reads necessary

Phases of the Datapath (3/5)

- **Phase 3: ALU (Arithmetic-Logic Unit)**
  - Real work of most instructions is done here: arithmetic (+, -, *, /), shifting, logic (&, |), comparisons (slt)
  - What about loads and stores?
    - eg, lw $t0, 40($t1)
    - Memory Address = Offset + Value in $t1
    - We perform this addition in this stage.

Phases of the Datapath (4/5)

- **Phase 4: Memory Access**
  - Only the load and store instructions do anything during this phase; the others remain idle or skip this phase all together.
  - Since these instructions have a unique step, we need this extra phase to account for them.
  - As a result of the cache system, this phase is expected to be fast.

Phases of the Datapath (5/5)

- **Phase 5: Register Write**
  - Most instructions write the result of some computation into a register.
  - E.g., arithmetic, logical, shifts, loads, slt, jal(!)
  - What about stores, branches, jumps?
    - Don’t write anything into a register at the end
    - These remain idle during this fifth phase or skip it all together.

Why Five Stages?

- Could we have a different number of stages?
  - Yes, and other architectures do.
- So why does MIPS have five if instructions tend to idle for at least one stage?
  - The five stages are the union of all the operations needed by all the instructions.
  - There is one instruction that uses all five stages: the load.

### Generic Steps of Datapath

1. Instruction Fetch
2. Decode/ Register Read
3. Execute
4. Memory
5. Reg. Write
Agenda

• 5 Stages of the Datapath
• Administrivia
• Quick Datapath Walkthrough
• Processor Design Process
  – Determine datapath requirements based on instruction set
  – Select datapath components
  – Assemble the datapath

Administrivia

• HW3 Due Wednesday at midnight
  – Should be able to answer last two questions after today’s lecture.
• Project 2 Part 2 due Sunday.
• Lab 11 posted.
• Lab 12 cancelled!
  – Replaced with free study session where you can catch up on labs / work on project 2.
  – The TA’s will still be there.

cs61c in…Minecraft!

• Person builds 16-bit ALU in Minecraft.

Datapath Walkthroughs (1/3)

  – Stage 1: fetch this instruction, inc. PC
  – Stage 2: decode to find it’s an add, then read registers $1 and $2
  – Stage 3: add the two values retrieved in Stage 2
  – Stage 4: idle (nothing to write to memory)
  – Stage 5: write result of Stage 3 into register $3

Example: add Instruction
Datapath Walkthroughs (2/3)

- `slti $3, $1, 17`
  - Stage 1: fetch this instruction, inc. PC
  - Stage 2: decode to find it's an `slti`, then read register $1
  - Stage 3: compare value retrieved in Stage 2 with the integer 17
  - Stage 4: idle
  - Stage 5: write the result of Stage 3 in register $3

Example: `slti` Instruction

Datapath Walkthroughs (3/3)

- `sw $3, 17($1)`
  - Stage 1: fetch this instruction, inc. PC
  - Stage 2: decode to find it's a `sw`, then read registers $1 and $3
  - Stage 3: add 17 to value in register $1 (retrieved in Stage 2)
  - Stage 4: write value in register $3 (retrieved in Stage 2) into memory address computed in Stage 3
  - Stage 5: idle (nothing to write into a register)

Example: `sw` Instruction

Agenda

- 5 Stages of the Datapath
- Administrivia
- Quick Datapath Walkthrough
- Processor Design Process
  - Determine datapath requirements based on instruction set
  - Select datapath components
  - Assemble the datapath

Processor Design Process

- Five steps to design a processor:
  - Step 1: Analyze instruction set to determine datapath requirements
  - Step 2: Select set of datapath components & establish clocking methodology
  - Step 3: Assemble datapath components to meet the requirements
  - Step 4: Analyze implementation of each instruction to determine setting of control points that realizes the register transfer
  - Step 5: Assemble the control logic
Processor Design Process

- Five steps to design a processor:
  - Step 1: Analyze instruction set to determine datapath requirements
  - Step 2: Select set of datapath components & establish clocking methodology
  - Step 3: Assemble datapath components to meet the requirements
  - Step 4: Analyze implementation of each instruction to determine setting of control points that realizes the register transfer
  - Step 5: Assemble the control logic

Register Transfer Language (RTL)

- RTL gives the meaning of the instructions

\[
\text{Instruction Fetches} \\
\begin{align*}
\text{Add} & : & \text{MEM} \rightarrow \text{R} \\
\text{Sub} & : & \text{MEM} \rightarrow \text{R} \\
\text{Or} & : & \text{MEM} \rightarrow \text{R} \\
\text{Load} & : & \text{MEM} \rightarrow \text{R} \\
\text{Store} & : & \text{MEM} \rightarrow \text{R} \\
\text{Branch} & : & \text{MEM} \rightarrow \text{R} \\
\end{align*}
\]

Step 1a: The MIPS-lite Subset for today

- **ADDU and SUBU**
  - \( \text{addu} \; \text{rd}, \text{rs}, \text{rt} \)
  - \( \text{subu} \; \text{rd}, \text{rs}, \text{rt} \)

- **OR Immediate**
  - \( \text{ori} \; \text{rt}, \text{rs}, \text{imm16} \)

- **LOAD and STORE Word**
  - \( \text{lw} \; \text{rt}, \text{rs}, \text{imm16} \)
  - \( \text{sw} \; \text{rt}, \text{rs}, \text{imm16} \)

- **BRANCH**
  - \( \text{beq} \; \text{rs}, \text{rt}, \text{imm16} \)

Required ALU Operations

- Addition, subtraction, logical OR, ==:
  - ADDU \( R[rd] = R[rs] + R[rt] \)
  - SUBU \( R[rd] = R[rs] - R[rt] \)
  - ORI \( R[rt] = R[rs] \lor \text{zero_ext}(\text{imm16}) \)
  - BEQ if \( R[rs] == R[rt] \)

- Test to see if output == 0 for any ALU operation gives == test. How?
- Full MIPS also adds AND, Set Less Than (1 if \( A < B \), 0 otherwise)
- ALU from Appendix C, section C.5
Storage Element: Idealized Memory

- Memory (idealized)
  - One input bus: Data In
  - One output bus: Data Out
- Memory word is found by:
  - Address selects the word to put on Data Out
  - Write Enable = 1: address selects the memory word to be written via the Data In bus
- Clock input (CLK)
  - CLK input is a factor ONLY during write operation
  - During read operation, behaves as a combinational logic block:
    - Address valid \( \Rightarrow \) Data Out valid after “access time”

Storage Element: Register (Building Block)

- Similar to D Flip Flop except
  - N-bit input and output
  - Write Enable input
- Write Enable:
  - Negated (or deasserted) (0): Data Out will not change
  - Asserted (1): Data Out will become Data In on rising edge of clock

Storage Element: Register File

- Register File consists of 32 registers:
  - Two 32-bit output busses: busA and busB
  - One 32-bit input bus: busW
- Register is selected by:
  - RA (number) selects the register to put on busA (data)
  - RB (number) selects the register to put on busB (data)
  - RW (number) selects the register to be written via busW (data) when Write Enable is 1
- Clock input (clk)
  - CLK input is a factor ONLY during write operation
  - During read operation, behaves as a combinational logic block:
    - RA or RB valid \( \Rightarrow \) busA or busB valid after “access time.”

Clocking Methodology (1/2)

- Single Cycle CPU: All stages of an instruction are completed within one long clock cycle.
  - The clock cycle is made sufficient long to allow each instruction to complete all stages without interruption and within one cycle.

Clocking Methodology (2/2)

- Multiple-cycle CPU: Only one stage of instruction per clock cycle.
  - The clock is made as long as the slowest stage.

Step 3: Assemble Datapath Meeting Requirements

Register Transfer Requirements \( \Rightarrow \) Assembly of Datapath

- In common between all instructions:
  - Fetch the Instruction: \( \text{mem}[PC] \)
  - Update the program counter:
    - Sequential Code: \( PC \leftarrow PC + 4 \)
    - Branch and Jump: \( PC \leftarrow ”something else” \)

- Several significant advantages over single cycle execution: Unused stages in a particular instruction can be skipped OR instructions can be pipelined (overlapped).
Add & Subtract
• $R[rd] = R[rs] \text{ op } R[rt]$ (addu, cd, rs, rt)
  - $Ra$, $Rb$, and $Rw$ come from instruction’s $Rs$, $Rt$, and $Rd$ fields
  - ALUctr and RegWr: control logic after decoding the instruction

Logical Operations with Immediate
• $R[rt] = R[rs] \text{ op } \text{ZeroExt}[imm16]$
  - Already defined the register file & ALU
  - ... Already defined the register file & ALU
  - ... Already defined the register file & ALU

Load Operations
• $R[rt] = \text{Mem}[R[rs] + \text{SignExt}[imm16]]$
  - Example: $\text{lw } rt, rs, imm16$
  - Already defined 32-bit MUX; Zero Ext?

Store Operations
• $\text{Mem} [R[rs] + \text{SignExt}[imm16]] = R[rt]$
  - Example: $\text{sw } rt, rs, imm16$
**Store Operations**


**Ex.:** sw rt, rs, imm16

<table>
<thead>
<tr>
<th>op</th>
<th>rs</th>
<th>rt</th>
<th>immediate</th>
</tr>
</thead>
<tbody>
<tr>
<td>31</td>
<td>26</td>
<td>21</td>
<td>16</td>
</tr>
</tbody>
</table>

6 bits | 5 bits | 5 bits | 16 bits

**The Branch Instruction**

- beq rs, rt, imm16

- mem[PC] Fetch the instruction from memory
- Zero = R[rs] - R[rt] Calculate branch condition
- if (Zero) Calculate the next instruction’s address
  - PC = PC + 4 + ( SignExt(imm16) x 4 )
  - else
  - PC = PC + 4

**Datapath for Branch Operations**

- beq rs, rt, imm16
- Datapath generates condition (zero)

**Putting it All Together: A Single Cycle Datapath**

**An Abstract View of the Implementation**