Pipelining
RISC-V
### ‘Sequential’ RISC-V Datapath

<table>
<thead>
<tr>
<th>Phase</th>
<th>Pictogram</th>
<th>$t_{\text{step Serial}}$</th>
</tr>
</thead>
<tbody>
<tr>
<td>Instruction Fetch</td>
<td></td>
<td>200 ps</td>
</tr>
<tr>
<td>Reg Read</td>
<td></td>
<td>100 ps</td>
</tr>
<tr>
<td>ALU</td>
<td></td>
<td>200 ps</td>
</tr>
<tr>
<td>Memory</td>
<td></td>
<td>200 ps</td>
</tr>
<tr>
<td>Register Write</td>
<td></td>
<td>100 ps</td>
</tr>
<tr>
<td>$t_{\text{instruction}}$</td>
<td></td>
<td>800 ps</td>
</tr>
</tbody>
</table>

#### Instruction Sequence

- `add t0, t1, t2`
- `or t3, t4, t5`
- `sll t6, t0, t3`

---

Garcia, Nikolić

RISC-V (30)
# Pipelined RISC-V Datapath

<table>
<thead>
<tr>
<th>Phase</th>
<th>Pictogram</th>
<th>$t_{step}$ Serial</th>
<th>$t_{cycle}$ Pipelined</th>
</tr>
</thead>
<tbody>
<tr>
<td>Instruction Fetch</td>
<td></td>
<td>200 ps</td>
<td>200 ps</td>
</tr>
<tr>
<td>Reg Read</td>
<td></td>
<td>100 ps</td>
<td>200 ps</td>
</tr>
<tr>
<td>ALU</td>
<td></td>
<td>200 ps</td>
<td>200 ps</td>
</tr>
<tr>
<td>Memory</td>
<td></td>
<td>200 ps</td>
<td>200 ps</td>
</tr>
<tr>
<td>Register Write</td>
<td></td>
<td>100 ps</td>
<td>200 ps</td>
</tr>
<tr>
<td>$t_{instruction}$</td>
<td></td>
<td>800 ps</td>
<td>1000 ps</td>
</tr>
</tbody>
</table>

Instruction sequence:
- `add t0, t1, t2`
- `or t3, t4, t5`
- `s1l t6, t0, t3`
### Pipelined RISC-V Datapath

<table>
<thead>
<tr>
<th></th>
<th>Single Cycle</th>
<th>Pipelined</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Timing</strong></td>
<td>$t_{\text{step}} = 100 \ldots 200 \text{ ps}$</td>
<td>$t_{\text{cycle}} = 200 \text{ ps}$</td>
</tr>
<tr>
<td></td>
<td>Register access only 100 ps</td>
<td>All cycles same length</td>
</tr>
<tr>
<td><strong>Instruction time, $t_{\text{instruction}}$</strong></td>
<td>$= t_{\text{cycle}} = 800 \text{ ps}$</td>
<td>$1000 \text{ ps}$</td>
</tr>
<tr>
<td><strong>CPI (Cycles Per Instruction)</strong></td>
<td>$\sim 1$ (ideal)</td>
<td>$\sim 1$ (ideal), $&lt; 1$ (actual)</td>
</tr>
<tr>
<td><strong>Clock rate, $f_s$</strong></td>
<td>$1/800 \text{ ps} = 1.25 \text{ GHz}$</td>
<td>$1/200 \text{ ps} = 5 \text{ GHz}$</td>
</tr>
<tr>
<td><strong>Relative speed</strong></td>
<td>$1 \times$</td>
<td>$4 \times$</td>
</tr>
</tbody>
</table>
Sequential vs. Simultaneous

- What happens sequentially and what simultaneously?

Instruction sequence:

1. `add t0, t1, t2`
2. `or t3, t4, t5`
3. `sll t6, t0, t3`
4. `sw t0, 4(t3)`
5. `lw t0, 8(t3)`
6. `addi t2, t2, 1`

- $t_{\text{instruction}} = 1000\text{ps}$
- $t_{\text{cycle}}$
Sequential vs. Simultaneous

- What happens sequentially and what simultaneously?

Instruction sequence:
- `add t0, t1, t2`
- `or t3, t4, t5`
- `sll t6, t0, t3`
- `sw t0, 4(t3)`
- `lw t0, 8(t3)`
- `addi t2, t2, 1`

Resource use over time:
- $t_{\text{instruction}} = 1000\, \text{ps}$
- $t_{\text{cycle}}$

Resource use in a particular time slot:
Pipelining Datapath
Single-Cycle RV32I Datapath

Instruction Fetch (IF)

Instruction Decode/Register Read (ID)

ALU Execute (EX)

Memory Access (MA)

Write Back
Single-Cycle RV32I Datapath

Recalculate PC+4 in M stage to avoid sending both PC and PC+4 down pipeline

Must pipeline instruction along with data, so control operates correctly in each stage
Single-Cycle RV32I Datapath

Pipeline registers separate stages, hold data for each instruction in flight

RISC-V [39]
Pipelined Control

- Control signals derived from instruction
  - As in single-cycle implementation
  - Information is stored in pipeline registers for use by later stages
Pipeline Hazards
Hazards Ahead!

**WARNING**

Fall Hazard. Stay Clear.

**WARNING**

High voltage inside. Keep out!
Will shock, burn or cause death.

**WARNING**

Beyond this point: Radio frequency fields at this site may exceed FCC rules for human exposure. Failure to obey all posted signs and site guidelines for working in radio frequency environments could result in serious injury.

**CAUTION**

X-RAY RADIATION
Pipelining Hazards

A *hazard* is a situation that prevents starting the next instruction in the next clock cycle

1) *Structural hazard*
   - A required resource is busy (e.g. needed in multiple stages)

2) *Data hazard*
   - Data dependency between instructions
   - Need to wait for previous instruction to complete its data read/write

3) *Control hazard*
   - Flow of execution depends on previous instruction
Structural Hazard

- **Problem**: Two or more instructions in the pipeline compete for access to a single physical resource

- **Solution 1**: Instructions take it in turns to use resource, some instructions have to stall

- **Solution 2**: Add more hardware to machine

- Can always solve a structural hazard by adding more hardware
Regfile Structural Hazards

- Each instruction:
  - Can read up to two operands in decode stage
  - Can write one value in writeback stage
- Avoid structural hazard by having separate “ports”
  - Two independent read ports and one independent write port
- Three accesses per cycle can happen simultaneously
Structural Hazard: Memory Access

- Instruction and data memory used simultaneously
  - Use two separate memories

Instruction sequence:

- `add t0, t1, t2`
- `lw t0, 8(t3)`
- `slt t6, t0, t3`
- `sw t0, 4(t3)`
- `addi t0, t1, t2`
Instruction and Data Caches

- Fast, on-chip memory, separate for instructions and data

![Diagram of processor and memory architecture](image)
Structural Hazards – Summary

- Conflict for use of a resource
- In RISC-V pipeline with a single memory
  - Load/store requires data access
  - Without separate memories, instruction fetch would have to stall for that cycle
    - All other operations in pipeline would have to wait
- Pipelined datapaths require separate instruction/data memories
  - Or separate instruction/data caches
- RISC ISAs (including RISC-V) designed to avoid structural hazards
  - e.g. at most one memory access/instruction
Data Hazards
Data Hazard: Register Access

- Separate ports, but what if write to same register as read?

Does \texttt{sw} in the example fetch the old or new value?

\begin{align*}
\text{add} & \ t_0, \ t_1, \ t_2 \\
\text{or} & \ t_3, \ t_4, \ t_5 \\
\text{slt} & \ t_6, \ t_0, \ t_3 \\
\text{sw} & \ t_0, \ 4(t_3) \\
\text{addi} & \ t_0, \ t_1, \ t_2
\end{align*}
Data Hazard: Register Access

- Exploit high speed of register file (100 ps)
  1) WB updates value
  2) ID reads new value
- Indicated in diagram by shading

Might not always be possible to write then read in same cycle, especially in high-frequency designs. Check assumptions in any question.
Data Hazard: ALU Result

Value of s0

\[
\begin{align*}
\text{add} & \ s0, t0, t1 \\
\text{sub} & \ t2, s0, t0 \\
\text{or} & \ t6, s0, t3 \\
\text{xor} & \ t5, t1, s0 \\
\text{sw} & \ s0, 8(t3)
\end{align*}
\]

Without some fix, **sub** and **or** will calculate wrong result!
Solution 1: Stalling

- **Problem:** Instruction depends on result from previous instruction
  
  ```
  add s0, t0, t1
  sub t2, s0, t3
  ```

- **Bubble:**
  - Effectively *nop*: Affected pipeline stages do “nothing”
Stalls and Performance

- Stalls reduce performance
  - But stalls are required to get correct results
- Compiler can arrange code or insert `nop`s `(addi x0, x0, 0)` to avoid hazards and stalls
  - Requires knowledge of the pipeline structure
Solution 2: Forwarding

Value of s0

- add $s0, t0, t1
- sub $t2, $s0, $t0
- or $t6, $s0, $t3
- xor $t5, $t1, $s0
- sw $s0, 8($t3)

Forwarding: grab operand from pipeline stage, rather than register file.

Garcia, Nikolić

RISC-V (55)
Forwarding (aka Bypassing)

- Use result when it is computed
  - Don’t wait for it to be stored in a register
  - Requires extra connections in the datapath

\[
\begin{align*}
\text{add } s0, t0, t1 \\
\text{sub } t2, s0, t3
\end{align*}
\]
Data Needed for Forwarding (Example)

- Compare destination of older instructions in pipeline with sources of new instruction in decode stage.
- Must ignore writes to x0!

```
add t0, t0, t1
sub t3, t0, t5
sub t6, t0, t3
```
Pipeline RV32I Datapath

Remember to forward operand B as well!