## Single Cycle CPU Design

Here we have a single cycle CPU diagram. Answer the following questions:

1. Name each component.
2. Name each datapath stage and explain its functionality.

| Stage | Functionality |
| :---: | :--- |
| Instruction <br> Fetch | Send an address to the instruction memory <br> Read the instruction (MEM[PC]) |
| Decode / <br> Register Read | Generate the control signal values using the opcode \& funct fields <br> Read the register values with the rs \& rt fields <br> Sign / zero extend the immediate |
| Execute | Perform arithmetic / logical operations |
| Memory | Read from / write to the data memory |
| Register Write | Write back the ALU result / the memory load to the register file |

3. Provide data inputs and control signals to the next PC logic.
4. Implement the next PC logic.


Single Cycle CPU Control Logic

Note: The Zero signal in the ALU is just one way to do this
The reasoning for using a "Zero" here is that based on the following instructions (on the next page) that we need to account for, we only want to branch if two values are equal. We can easily do this by subtracting the two and outputting a 1 if the result is equivalent to 0 (hence the "Zero" signal)

Fill out the values for the control signals from the previous CPU diagram.

| Instrs. | Control Signals |  |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  | Jump | Branch | RegDst | ExtOp | ALUSSrc | ALUCtr | MemWr | MemtoReg | RegWr |  |
| add | 0 | 0 | 1 | X | 0 | 0010 | 0 | 0 | 1 |  |
| ori | 0 | 0 | 0 | 0 | 1 | 0001 | 0 | 0 | 1 |  |
| lw | 0 | 0 | 0 | 1 | 1 | 0010 | 0 | 1 | 1 |  |
| sw | 0 | 0 | X | 1 | 1 | 0010 | 1 | X | 0 |  |
| beq | 0 | 1 | X | 1 | 0 | 0110 | 0 | X | 0 |  |
| j | 1 | X | X | X | X | XXXX | 0 | X | 0 |  |

X : don't care value(either 0 or 1 is ok)
This table shows the ALUCtr values for each operation of the ALU:

| Operation | AND | OR | ADD | SUB | SLT | NOR |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ALUCtr | 0000 | 0001 | 0010 | 0110 | 0111 | 1100 |

## Clocking Methodology

- The input signal to each state element must stabilize before each rising edge.
- Critical path: Longest delay path between state elements in the circuit.
- $\mathrm{t}_{\mathrm{clk}} \geq \mathrm{t}_{\mathrm{clk}-\mathrm{to} \mathrm{q}}+\mathrm{t}_{\mathrm{cL}}+\mathrm{t}_{\text {setup }}$, where $\mathrm{t}_{\mathrm{cL}}$ is the critical path in the combinational logic.
- If we place registers in the critical path, we can shorten the period by reducing the amount of logic between registers.


## Single Cycle CPU Performance Analysis

The delays of circuit elements are given as follows:

| Element | Register clk-to-q | Register Setup | MUX | ALU | Mem <br> Read | Mem Write | RegFile Read | RegFile Setup |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Parameter | tclk-toq | $\mathrm{t}_{\text {setup }}$ | $\mathrm{t}_{\text {mux }}$ | $\mathrm{t}_{\text {ALU }}$ | tmenread | tmem | trfread | $\mathrm{T}_{\text {RFsetup }}$ |
| Delay(ps) | 30 | 20 | 25 | 200 | 250 | 200 | 150 | 20 |

1. Give an instruction that exercises the critical path.

Load Word (lw)
2. What is the critical path in the single cycle CPU?

Red dashed line in the diagram
3. What are the minimum clock cycle, $\mathrm{t}_{\mathrm{clk}}$, and the maximum clock frequency, $\mathrm{f}_{\mathrm{clk}}$ ?

Assume the $\mathrm{t}_{\text {clk-to-q }}>$ hold time.
$\mathrm{t}_{\mathrm{clk}}>=\mathrm{t}_{\text {PC, }}$ clk-to-q $+\mathrm{t}_{\text {IMEMread }}+\mathrm{t}_{\text {RFread }}+\mathrm{t}_{\text {ALU }}+\mathrm{t}_{\text {DMEMread }}+\mathrm{t}_{\text {mux }}+\mathrm{t}_{\text {RFsetup }}$
$=30+250+150+200+250+25+20=925 \mathrm{ps}$
$\mathrm{f}_{\text {clk }}=1 / \mathrm{t}_{\mathrm{clk}}<=1 /(925 \mathrm{ps})=1.08 \mathrm{GHz}$
4. Why is a single cycle CPU inefficient?
-Not all instructions exercise the critical path.
-It is not parallelized. Each component can be active concurrently.
5 . How can you improve its performance? What is the purpose of pipelining?
Pipelining: Put pipeline registers between two datapath stages. $\rightarrow$ reduce the clock time

