You are Here!

- **Software**
  - **Parallel Requests**
    - Assigned to computer
    - e.g., Search "Katz"
  - **Parallel Threads**
    - Assigned to core
    - e.g., Lookup, Ads
  - **Parallel Instructions**
    - >1 instruction @ one time
    - e.g., 5 pipelined instructions
  - **Parallel Data**
    - >1 data item @ one time
    - e.g., Add of 4 pairs of words
- **Hardware**
  - **Harness Parallelism & Achieve High Performance**
  - **Warehouse Scale Computer**
  - **Logic Gates**
  - **Core**
  - **Memory**
  - **Instruction Unit(s)**
  - **Functional Unit(s)**
  - **Cache Memory**
  - **Computation (Today)**
- **Programming Languages**
  - All gates @ one time
Levels of Representation/Interpretation

- **High Level Language Program (e.g., C)**
- **Assembly Language Program (e.g., MIPS)**
- **Machine Language Program (MIPS)**

### Compiler
- `temp = v[k];`
- `v[k] = v[k+1];`
- `v[k+1] = temp;`

### Assembler
- `lw $t0, 0($2)`
- `lw $t1, 4($2)`
- `sw $t1, 0($2)`
- `sw $t0, 4($2)`

### Machine Language Program (MIPS)

#### Machine Interpretation
- **Hardware Architecture Description (e.g., block diagrams)**

#### Architecture Implementation
- **Logic Circuit Description (Circuit Schematic Diagrams)**
- **Register File**
- **ALU**

### Agenda

- Timing and State Machines
- Datapath Elements: Mux + ALU
- MIPS-lite Datapath
- CPU Timing
- MIPS-lite Control
- And, in Conclusion, ...

```
0000 1001 1100 0110 1010 1111 0101 1000
1010 1111 0101 1000 0000 1001 1100 0110
1100 0110 1010 1111 0101 1000 0000 1001
0101 1000 0000 1001 1100 0110 1010 1111
```
Agenda

• Timing and State Machines
• Datapath Elements: Mux + ALU
• MIPS-lite Datapath
• CPU Timing
• MIPS-lite Control
• And, in Conclusion, ...

Last Time: Summation Circuit

Register is used to hold up the transfer of data to adder

Square wave clock sets when things change

Rough timing ...

Xi must be ready before clock edge due to adder delay

High (1)
Low (0)

High (1)
Low (0)

High (1)
Low (0)

Time
Register Internals

- n instances of a “Flip-Flop”
- Flip-flop name because the output flips and flops between 0 and 1
- D is “data input”, Q is “data output”
- Also called “D-type Flip-Flop”

Camera Analogy Timing Terms

- Want to take a portrait – timing right before and after taking picture
- Set up time – don’t move since about to take picture (open camera shutter)
- Hold time – need to hold still after shutter opens until camera shutter closes
- Time click to data – time from open shutter until can see image on output (viewfinder)
Hardware Timing Terms

- **Setup Time:** when the input must be stable *before* the edge of the CLK
- **Hold Time:** when the input must be stable *after* the edge of the CLK
- **“CLK-to-Q” Delay:** how long it takes the output to change, measured from the edge of the CLK

FSM Maximum Clock Frequency

- What is the maximum frequency of this circuit?

Max Delay = Setup Time + CLK-to-Q Delay + CL Delay

**Hint:**
Frequency = 1/Period
Another Great (Theory) Idea: Finite State Machines (FSM)

- You may have seen FSMs in other classes (e.g., CS70)
- Same basic idea
- Function can be represented with a “state transition diagram”
- With combinational logic and registers, any FSM can be implemented in hardware

Example: 3 Ones FSM

FSM to detect the occurrence of 3 consecutive 1’s in the Input

Assume state transitions are controlled by the clock: On each clock cycle the machine checks the inputs and moves to a new state and produces a new output ...
Hardware Implementation of FSM

Register needed to hold a representation of the machine’s state.
Unique bit pattern for each state.

Combinational logic circuit is used to implement a function maps from present state (PS) and input to next state (NS) and output.

The register is used to break the feedback path between Next State (NS) and Prior State (PS), controlled by the clock.

Hardware for FSM: Combinational Logic

Can look at its functional specification, truth table form

Truth table ...

<table>
<thead>
<tr>
<th>PS</th>
<th>Input</th>
<th>NS</th>
<th>Output</th>
</tr>
</thead>
<tbody>
<tr>
<td>00</td>
<td>0</td>
<td>00</td>
<td>0</td>
</tr>
<tr>
<td>00</td>
<td>1</td>
<td>01</td>
<td>0</td>
</tr>
<tr>
<td>01</td>
<td>0</td>
<td>00</td>
<td>0</td>
</tr>
<tr>
<td>01</td>
<td>1</td>
<td>10</td>
<td>0</td>
</tr>
<tr>
<td>10</td>
<td>0</td>
<td>00</td>
<td>0</td>
</tr>
<tr>
<td>10</td>
<td>1</td>
<td>00</td>
<td>1</td>
</tr>
</tbody>
</table>
Hardware for FSM: Combinational Logic

Truth table ...

<table>
<thead>
<tr>
<th>PS</th>
<th>Input</th>
<th>NS</th>
<th>Output</th>
</tr>
</thead>
<tbody>
<tr>
<td>00</td>
<td>0</td>
<td>00</td>
<td>0</td>
</tr>
<tr>
<td>00</td>
<td>1</td>
<td>01</td>
<td>0</td>
</tr>
<tr>
<td>01</td>
<td>0</td>
<td>00</td>
<td>0</td>
</tr>
<tr>
<td>01</td>
<td>1</td>
<td>10</td>
<td>0</td>
</tr>
<tr>
<td>10</td>
<td>0</td>
<td>00</td>
<td>0</td>
</tr>
<tr>
<td>10</td>
<td>1</td>
<td>00</td>
<td>1</td>
</tr>
</tbody>
</table>

Hardware for FSM: Combinational Logic

Alternative Truth Table format: list only cases where value is a 1. Then restate as logic equations using PS1, PS0, Input

Truth table ...

<table>
<thead>
<tr>
<th>PS</th>
<th>Input</th>
<th>NS</th>
<th>Output</th>
</tr>
</thead>
<tbody>
<tr>
<td>00</td>
<td>0</td>
<td>00</td>
<td>0</td>
</tr>
<tr>
<td>00</td>
<td>1</td>
<td>01</td>
<td>0</td>
</tr>
<tr>
<td>01</td>
<td>0</td>
<td>00</td>
<td>0</td>
</tr>
<tr>
<td>01</td>
<td>1</td>
<td>10</td>
<td>0</td>
</tr>
<tr>
<td>10</td>
<td>0</td>
<td>00</td>
<td>0</td>
</tr>
<tr>
<td>10</td>
<td>1</td>
<td>00</td>
<td>1</td>
</tr>
</tbody>
</table>
Hardware for FSM: Combinational Logic

Alternative Truth Table format: list only cases where value is a 1. Then restate as logic equations using PS1, PS0, Input

Truth table ...

<table>
<thead>
<tr>
<th>PS</th>
<th>Input</th>
<th>NS</th>
<th>Output</th>
</tr>
</thead>
<tbody>
<tr>
<td>00</td>
<td>0</td>
<td>00</td>
<td>0</td>
</tr>
<tr>
<td>00</td>
<td>1</td>
<td>01</td>
<td>0</td>
</tr>
<tr>
<td>01</td>
<td>0</td>
<td>00</td>
<td>0</td>
</tr>
<tr>
<td>01</td>
<td>1</td>
<td>10</td>
<td>0</td>
</tr>
<tr>
<td>10</td>
<td>0</td>
<td>00</td>
<td>0</td>
</tr>
<tr>
<td>10</td>
<td>1</td>
<td>01</td>
<td>1</td>
</tr>
</tbody>
</table>

NS bit 0 is 1

<table>
<thead>
<tr>
<th>PS</th>
<th>Input</th>
<th>Output</th>
</tr>
</thead>
<tbody>
<tr>
<td>00</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>

NS bit 1 is 1

<table>
<thead>
<tr>
<th>PS</th>
<th>Input</th>
<th>Output</th>
</tr>
</thead>
<tbody>
<tr>
<td>01</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>

Output is 1

<table>
<thead>
<tr>
<th>PS</th>
<th>Input</th>
<th>Output</th>
</tr>
</thead>
<tbody>
<tr>
<td>10</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>
Administrivia

Agenda

• Timing and State Machines
• Datapath Elements: Mux + ALU
• MIPS-lite Datapath
• CPU Timing
• MIPS-lite Control
• And, in Conclusion, ...
Design Hierarchy

Conceptual MIPS Datapath
Data Multiplexer
(e.g., 2-to-1 x n-bit-wide)

N Instances of 1-bit-Wide Mux

\[ c = \overline{s}ab + \overline{s}ab + s\overline{a}b + sab \]
\[ = \overline{s}(a\overline{b} + ab) + s(\overline{a}b + ab) \]
\[ = \overline{s}(a(b + b)) + s((\overline{a} + a)b) \]
\[ = \overline{s}(a(1) + s((1)b) \]
\[ = \overline{s}a + sb \]

<table>
<thead>
<tr>
<th>s</th>
<th>ab</th>
<th>c</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>00</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>01</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>10</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>11</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>00</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>01</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>10</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>11</td>
<td>1</td>
</tr>
</tbody>
</table>
How Do We Build a 1-bit-Wide Mux (in Logisim)?

4-to-1 Multiplexer

How many rows in TT?

\[ e = \overline{s_1}s_0a + \overline{s_1}s_0b + s_1\overline{s_0}c + s_1s_0d \]
Alternative Hierarchical Approach (in Logisim)

Logisim
Subcircuits

- Subcircuit: Logisim equivalent of procedure or method
  - Every project is a hierarchy of subcircuits

N-bit-wide Data Multiplexer
(in Logisim + tunnel)
Arithmetic and Logic Unit

- Most processors contain a special logic block called “Arithmetic and Logic Unit” (ALU)
- We’ll show you an easy one that does ADD, SUB, bitwise AND, bitwise OR

```
when S=00, R=A+B
when S=01, R=A-B
when S=10, R=A AND B
when S=11, R=A OR B
```
Adder/Subtractor: One-bit adder Least Significant Bit

\[
\begin{array}{cccc}
  a_3 & a_2 & a_1 & a_0 \\
+ & b_3 & b_2 & b_1 & b_0 \\
  s_3 & s_2 & s_1 & s_0 \\
\end{array}
\]

<table>
<thead>
<tr>
<th>a_0</th>
<th>b_0</th>
<th>s_0</th>
<th>c_1</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
</tbody>
</table>

\[ s_0 = a_0 \ XOR \ b_0 \]
\[ c_1 = a_0 \ AND \ b_0 \]

---

Adder/Subtractor: One-bit adder (1/2)

\[
\begin{array}{cccc}
  a_3 & a_2 & a_1 & a_0 \\
+ & b_3 & b_2 & b_1 & b_0 \\
  s_3 & s_2 & s_1 & s_0 \\
\end{array}
\]

\[
\begin{array}{cccc}
  a_i & b_i & c_i & s_i & c_{i+1} \\
0 & 0 & 0 & 0 & 0 \\
0 & 0 & 1 & 1 & 0 \\
0 & 1 & 0 & 1 & 0 \\
0 & 1 & 1 & 0 & 1 \\
1 & 0 & 0 & 1 & 0 \\
1 & 0 & 1 & 0 & 1 \\
1 & 1 & 0 & 0 & 1 \\
1 & 1 & 1 & 1 & 1 \\
\end{array}
\]

\[ s_i = \]
\[ c_{i+1} = \]
Adder/Subtractor: One-bit Adder (2/2)

\[ s_i = \text{XOR}(a_i, b_i, c_i) \]
\[ c_{i+1} = \text{MAJ}(a_i, b_i, c_i) = a_i b_i + a_i c_i + b_i c_i \]

\[ a_i \quad b_i \quad c_i \quad s_i \]
\[ a_i \quad b_i \quad c_i \quad C_{i+1} \]

N x 1-bit Adders \( \Rightarrow \) 1 N-bit Adder

Connect Carry Out \( i-1 \) to Carry in \( i \):
Twos Complement Adder/Subtractor

Critical Path

- When setting clock period in synchronous systems, must allow for worst case
- Path through combinational logic that is worst case called “critical path”
  - Can be estimated by number of “gate delays”: Number of gates must go through in worst case
- Idea: Doesn’t matter if speedup other paths if don’t improve the critical path
- What might critical path of ALU?
Agenda

• Multiplexer
• ALU Design
• **MIPS-lite Datapath**
• CPU Timing
• MIPS-lite Control
• And, in Conclusion, ...

---

Agenda

• Timing and State Machines
• Datapath Elements: Mux + ALU
• **MIPS-lite Datapath**
• CPU Timing
• MIPS-lite Control
• And, in Conclusion, ...
Processor Design Process

- Five steps to design a processor:
  1. Analyze instruction set ➔ datapath requirements
  2. Select set of datapath components & establish clock methodology
  3. Assemble datapath meeting the requirements
  4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer.
  5. Assemble the control logic
     - Formulate Logic Equations
     - Design Circuits

The MIPS-lite Subset

- **ADDU and SUBU**
  - addu rd, rs, rt
  - subu rd, rs, rt

- **OR Immediate:**
  - ori rt, rs, imm16

- **LOAD and STORE Word**
  - lw rt, rs, imm16
  - sw rt, rs, imm16

- **BRANCH:**
  - beq rs, rt, imm16
Register Transfer Language (RTL)

• RTL gives the meaning of the instructions
  \{op, rs, rt, rd, shamt, funct\} ← MEM[ PC ]
  \{op, rs, rt, Imm16\} ← MEM[ PC ]
• All start by fetching the instruction

  Inst  Register Transfers
  ADDU  R[rd] ← R[rs] + R[rt]; PC ← PC + 4
  SUBU  R[rd] ← R[rs] − R[rt]; PC ← PC + 4
  ORI   R[rt] ← R[rs] | zero_ext(Imm16); PC ← PC + 4
  LOAD  R[rt] ← MEM[ R[rs] + sign_ext(Imm16)]; PC ← PC + 4
  STORE MEM[ R[rs] + sign_ext(Imm16) ] ← R[rt]; PC ← PC + 4
  BEQ   if ( R[rs] == R[rt] )
         then PC ← PC + 4 + (sign_ext(Imm16) || 00)
         else PC ← PC + 4

Step 1: Requirements of the Instruction Set

• Memory (MEM)
  – Instructions & data (will use one for each: really caches)
• Registers (R: 32 x 32)
  – Read rs
  – Read rt
  – Write rt or rd
• PC
• Extender (sign/zero extend)
• Add/Sub/OR unit for operation on register(s) or extended immediate
• Add 4 (+ maybe extended immediate) to PC
• Compare if registers equal?
Generic Steps of Datapath

1. Instruction Fetch
2. Decode/Register Read
3. Execute
4. Memory
5. Register Write

Step 2: Components of the Datapath

- Combinational Elements
- State Elements + Clocking Methodology
- Building Blocks
ALU Needs for MIPS-lite + Rest of MIPS

- Addition, subtraction, logical OR, ==:
  
  \[
  \text{ADDU} \quad R[rd] = R[rs] + R[rt]; \ldots \\
  \text{SUBU} \quad R[rd] = R[rs] - R[rt]; \ldots \\
  \text{ORI} \quad R[rt] = R[rs] | \text{zero_ext(Imm16)} \ldots \\
  \text{BEQ} \quad \text{if } ( R[rs] == R[rt] ) \ldots \\
  \]

- Test to see if output == 0 for any ALU operation gives == test. How?
- P&H also adds AND, Set Less Than (1 if A < B, 0 otherwise)
- ALU from Appendix C, section C.5

Storage Element: Idealized Memory

- Memory (idealized)
  - One input bus: Data In
  - One output bus: Data Out
- Memory word is found by:
  - Address selects the word to put on Data Out
  - Write Enable = 1: address selects the memory word to be written via the Data In bus
- Clock input (CLK)
  - CLK input is a factor ONLY during write operation
  - During read operation, behaves as a combinational logic block: Address valid \(\Rightarrow\) Data Out valid after “access time”
Storage Element: Register (Building Block)

- Similar to D Flip Flop except
  - N-bit input and output
  - Write Enable input

- Write Enable:
  - Negated (or deasserted) (0): Data Out will not change
  - Asserted (1): Data Out will become Data In on rising edge of clock

Storage Element: Register File

- Register File consists of 32 registers:
  - Two 32-bit output busses: busA and busB
  - One 32-bit input bus: busW

- Register is selected by:
  - RA (number) selects the register to put on busA (data)
  - RB (number) selects the register to put on busB (data)
  - RW (number) selects the register to be written via busW (data) when Write Enable is 1

- Clock input (clk)
  - Clock input is a factor ONLY during write operation
  - During read operation, behaves as a combinational logic block:
    - RA or RB valid \(\Rightarrow\) busA or busB valid after “access time.”
Step 3: Assemble DataPath Meeting Requirements

- Register Transfer Requirements ⇒ Datapath Assembly
- Instruction Fetch
- Read Operands and Execute Operation
- Common RTL operations
  - Fetch the Instruction: mem[PC]
  - Update the program counter:
    - Sequential Code: PC ← PC + 4
    - Branch and Jump: PC ← “something else”

Step 3: Add & Subtract

- \( R[rd] = R[rs] \text{ op } R[rt] \) (addu rd,rs,rt)
  - Ra, Rb, and Rw come from instruction’s Rs, Rt, and Rd fields

- ALUctr and RegWr: control logic after decoding the instruction

- ... Already defined the register file & ALU
Agenda

- Timing and State Machines
- Datapath Elements: Mux + ALU
- MIPS-lite Datapath
- CPU Timing
- MIPS-lite Control
- And, in Conclusion, ...

Clocking Methodology

- Storage elements clocked by same edge
- “Critical path” (longest path through logic) determines length of clock period
- Have to allow for Clock-to-Q and Setup Times too
- This lecture (and P&H sections) 4.3-4.4 do whole instruction in 1 clock cycle for pedagogic reasons
  - Project 4 will do it in 2 clock cycles via simple pipelining
  - Soon explain pipelining and use 5 clock cycles per instruction
Register-Register Timing: One Complete Cycle

Agenda

- Timing and State Machines
- Datapath Elements: Mux + ALU
- MIPS-lite Datapath
- CPU Timing
- MIPS-lite Control
- And, in Conclusion, ...
Logical Operations with Immediate

- $R_{rt} = R_{rs} \text{ op} \text{ ZeroExt}[\text{imm16}]$

```
<table>
<thead>
<tr>
<th></th>
<th>op</th>
<th>rs</th>
<th>rt</th>
<th>immediate</th>
</tr>
</thead>
<tbody>
<tr>
<td>31</td>
<td>26</td>
<td>21</td>
<td>16</td>
<td>15</td>
</tr>
<tr>
<td>16</td>
<td>15</td>
<td>16</td>
<td>16</td>
<td>16</td>
</tr>
<tr>
<td>15</td>
<td>15</td>
<td>16</td>
<td>16</td>
<td>16</td>
</tr>
<tr>
<td>15</td>
<td>15</td>
<td>16</td>
<td>16</td>
<td>16</td>
</tr>
<tr>
<td>15</td>
<td>15</td>
<td>16</td>
<td>16</td>
<td>16</td>
</tr>
<tr>
<td>15</td>
<td>15</td>
<td>16</td>
<td>16</td>
<td>16</td>
</tr>
<tr>
<td>15</td>
<td>15</td>
<td>16</td>
<td>16</td>
<td>16</td>
</tr>
<tr>
<td>15</td>
<td>15</td>
<td>16</td>
<td>16</td>
<td>16</td>
</tr>
</tbody>
</table>
```

But we’re writing to Rt register??
And immediate ALU input??
Logical Operations with Immediate

- \( R[rt] = R[rs] \text{ op } \text{ZeroExt}[\text{imm16}] \)

![Diagram of logical operations]

Load Operations

- \( R[rt] = \text{Mem}[R[rs] + \text{SignExt}[\text{imm16}]] \)

Example: \( \text{lw } rt, rs, \text{imm16} \)

![Diagram of load operations]

What sign extending??
And where is Mem??
Load Operations

- \( R[rt] = Mem[R[rs] + \text{SignExt}[\text{imm16}]] \)

Example: \( lw \ rt, rs, \text{imm16} \)

```
   31 26 21 16 0
   op   rs  rt  immediate
   6 bits 5 bits 16 bits
```

RTL: The **Add** Instruction

```
add rd, rs, rt
- MEM[PC] Fetch the instruction from memory
- PC = PC + 4 Calculate the next instruction’s address
```
Instruction Fetch Unit at Beginning of Add

- Fetch the instruction from Instruction memory:
  Instruction = MEM[PC]
  - same for all instructions

Single Cycle Datapath during Add

R[rd] = R[rs] + R[rt]
Instruction Fetch Unit at End of Add

- PC = PC + 4
  - Same for all instructions except: Branch and Jump

Single Cycle Datapath during OR Immediate

- R[rt] = R[rs] OR ZeroExt[Imm16]
Single Cycle Datapath during **OR Immediate**

- $R[rt] = R[rs] \text{ OR ZeroExt}[imm16]$

Single Cycle Datapath during **Load**

- $R[rt] = \text{ Data Memory } \{ R[rs] + \text{SignExt}[imm16] \}$
Single Cycle Datapath during **Load**

- \( R_{[rt]} = \text{Data Memory} \{ R_{[rs]} + \text{SignExt}[\text{imm16}] \} \)

Single Cycle Datapath during **Store**

- \( \text{Data Memory} \{ R_{[rs]} + \text{SignExt}[\text{imm16}] \} = R_{[rt]} \)
Single Cycle Datapath during **Store**

- Data Memory \[R[rs] + \text{SignExt}[imm16]\] = \[R[rt]\]

```
\begin{array}{cccccc}
\hline
\text{op} & \text{rs} & \text{rt} & \text{immediate} \\
\hline
\end{array}
```

```
\begin{array}{cccc}
\hline
\text{InstrucVon} & <31:0> \\
\hline
\text{<21:25>} & \text{<16:20>} & \text{<11:15>} & \text{<0:15>} \\
\hline
\end{array}
```

Single Cycle Datapath during **Branch**

- if \((R[rs] - R[rt] == 0)\) then Zero = 1; else Zero = 0

```
\begin{array}{cccc}
\hline
\text{op} & \text{rs} & \text{rt} & \text{immediate} \\
\hline
\end{array}
```

```
\begin{array}{cccc}
\hline
\text{InstrucVon} & <31:0> \\
\hline
\text{<21:25>} & \text{<16:20>} & \text{<11:15>} & \text{<0:15>} \\
\hline
\end{array}
```
Single Cycle Datapath during Branch

- if \( (R[rs] - R[rt] == 0) \) then \( \text{Zero} = 1 \); else \( \text{Zero} = 0 \)

Instruction Fetch Unit at the End of Branch

- if \( (\text{Zero} == 1) \) then \( \text{PC} = \text{PC} + 4 + \text{SignExt}[\text{imm16}] \times 4 \); else \( \text{PC} = \text{PC} + 4 \)

- What is encoding of \( \text{nPC}_\text{sel} \)?
  - Direct MUX select?
  - Branch inst. / not branch

- Let’s pick 2nd option

Q: What logic gate?
Summary: Datapath’s Control Signals

- **ExtOp:** “zero”, “sign”
- **ALUsrc:** 0 ⇒ regB;
  1 ⇒ immed
- **ALUctr:** “ADD”, “SUB”, “OR”
- **MemWr:** 1 ⇒ write memory
- **MemtoReg:** 0 ⇒ ALU; 1 ⇒ Mem
- **RegDst:** 0 ⇒ “rt”; 1 ⇒ “rd”
- **RegWr:** 1 ⇒ write register

Agenda

- Timing and State Machines
- Datapath Elements: Mux + ALU
- MIPS-lite Datapath
- CPU Timing
- MIPS-lite Control
- And, in Conclusion, ...
And. in Conclusion, ...

Single-Cycle Processor

- Use muxes to select among input
  - S input bits selects $2^S$ inputs
  - Each input can be n-bits wide, independent of S
- Can implement muxes hierarchically
- Arithmetic circuits are a kind of combinational logic

- Five steps to processor design:
  1. Analyze instruction set $\rightarrow$ datapath requirements
  2. Select set of datapath components & establish clock methodology
  3. Assemble datapath meeting the requirements
  4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer.
  5. Assemble the control logic
     - Formulate Logic Equations
     - Design Circuits