# **EECS251B : Advanced Digital Circuits** and Systems

### Lecture 24 – Clock Gating, Leakage, Sleep Borivoje Nikolić

WCH CH32V003 - 0.1\$ RISC-V 32-bit microcontroller : weekend dieshot

**February 1, 2024.** Nanjing Qinheng Microelectronics (WCH) CH32V003 is a 0.1-0.15\$ 32-bit RISC-V microcontroller (16KiB flash, 2KiB SRAM). This specific die was from J4M6 variant in SO-8 package, but it is clear that die is universal as it has way more than 8 pads. There was suspicion that external flash is used similar to GD32, but this is not the case, at least in V003 line. It is now also supported by Arduino platform with open source tools.

Die size 1732x1172 µm. Smallest features visible from top layer are 250nm, but technology node is likely much finer. Comparing to time-proven STM32F100C4T6B die area is 4.39x smaller, so it could be around 90nm.







#### **Announcements**

- Homework 5 due next week
  - Quiz 4 today
- Project
  - Pay attention to integration with other teams!
  - Final presentations: May 2, 9am-12pm
- Final exam: April 26, in class



Clock Gating (Take two)

### Power / Energy Optimization Space

|         | Constant Throughput/Latency                                                      |       | Variable Throughput/Latency                 |                                 |
|---------|----------------------------------------------------------------------------------|-------|---------------------------------------------|---------------------------------|
| Energy  | Design Time                                                                      | Slee  | p Mode                                      | Run Time                        |
| Active  | Logic design<br>Scaled V <sub>DD</sub><br>Trans. sizing<br>Multi-V <sub>DD</sub> | Clock | c gating                                    | DFS, DVS                        |
| Leakage | Stack effects Trans sizing Scaling V <sub>DD</sub> + Multi-V <sub>Th</sub>       |       | ep T's<br>Variable V <sub>Th</sub><br>itrol | DVS<br>Variable V <sub>Th</sub> |

#### Flop activity



Lots of flops don't change their state very frequently

Wimer TVLSI'14

#### Clock gating

Selective shut-down of a part of a clock tree



## Clock gating cell types

Latch Free







#### Value-based clock gating

- If both input operands for **add** have all zeros in their top 48 bits, these bits do not have to be latched and sent to the functional units
- Zeros can be multiplexed onto the top 48 bits of the result bus, rather than computed via the adder
- Low 16 bits are always latched normally
- High 48 bits are selectively latched based on zero48 signal that accompanies the input operand from the reservation stations or the bypass
   network



#### Local Clock Gating





'Clock on demand' Flip-flop

#### Clock gating does not come for free



• Increases the number of critical paths

Wimer TVLSI'14



## Lowering Leakage During Design: Multiple Thresholds

## Power / Energy Optimization Space

|         | Constant Throughput/Latency                                                      |                                                    | Variable Throughput/Latency                                        |                                  |
|---------|----------------------------------------------------------------------------------|----------------------------------------------------|--------------------------------------------------------------------|----------------------------------|
| Energy  | Design Time                                                                      | Slee                                               | p Mode                                                             | Run Time                         |
| Active  | Logic design<br>Scaled V <sub>DD</sub><br>Trans. sizing<br>Multi-V <sub>DD</sub> | Scaled V <sub>DD</sub> Trans. sizing  Clock gating |                                                                    | DFS, DVS                         |
| Leakage | Stack effects Trans sizing Scaling V <sub>DD</sub> + Multi-V <sub>Th</sub>       | Mu<br>Varia                                        | ep T's<br>Iti-V <sub>DD</sub><br>Ible V <sub>Th</sub><br>t control | DVS,<br>Variable V <sub>Th</sub> |

#### Using Multiple Thresholds

- Cell-by-cell V<sub>T</sub> assignment (not block level)
   Allows us to minimize leakage
- Achieves all-low-V performance





Yano, SSTCW'00

#### Typical Technologies

- 2-3 Thresholds
  - To choose from 4-6 in a node
  - In bulk and finfet, but not in FDSOI (unless doped)
- $^{ullet}$  Threshold voltage diff  $\sim 5$ -10x in leakage



## Lowering Leakage During Design: Longer Channels

## Power / Energy Optimization Space

|         | Constant Throughput/Latency                                                  |             | Variable Throughput/Latency                                        |                                  |
|---------|------------------------------------------------------------------------------|-------------|--------------------------------------------------------------------|----------------------------------|
| Energy  | Design Time                                                                  | Slee        | p Mode                                                             | Run Time                         |
| Active  | Logic design Scaled V <sub>DD</sub> Trans. sizing Multi-V <sub>DD</sub>      |             | k gating                                                           | DFS, DVS                         |
| Leakage | Stack effects  Trans sizing  Scaling V <sub>DD</sub> + Multi-V <sub>Th</sub> | Mu<br>Varia | ep T's<br>Iti-V <sub>DD</sub><br>able V <sub>Th</sub><br>t control | DVS,<br>Variable V <sub>Th</sub> |

#### **Longer Channels**



- •10% longer gates reduce leakage by 35% (in 130nm)
- Increases switching energy by 21% with W/L = const.

- Attractive when don't have to increase W (memory)
- Doubling L reduces leakage by 3x (in 0.13um)
- Much stronger effect in e.g. 28nm!
- Effect improves with shorter channel devices

### Poly Bias

• 28FDSOI example







Lowering Leakage During Design: Transistor Stacking

## Power / Energy Optimization Space

|         | Constant Throughput/Latency                                                      |             | Variable Throughput/Latency                                        |                            |
|---------|----------------------------------------------------------------------------------|-------------|--------------------------------------------------------------------|----------------------------|
| Energy  | Design Time                                                                      | Slee        | p Mode                                                             | Run Time                   |
| Active  | Logic design<br>Scaled V <sub>DD</sub><br>Trans. sizing<br>Multi-V <sub>DD</sub> | Cloc        | c gating                                                           | DFS, DVS                   |
| Leakage | Stack effects  Trans sizing  Scaling V <sub>DD</sub> + Multi-V <sub>Th</sub>     | Mu<br>Varia | ep T's<br>Iti-V <sub>DD</sub><br>Ible V <sub>Th</sub><br>t control | + Variable V <sub>Th</sub> |

#### Stack Effect



Reduction (in 0.13µ):

|        | $\mathit{High}\ V_t$ | Low $V_t$ |
|--------|----------------------|-----------|
| 2 NMOS | 10.7X                | 9.96X     |
| 3 NMOS | 21.1X                | 18.8X     |
| 4 NMOS | 31.5X                | 26.7X     |
| 2 PMOS | 8.6X                 | 7.9X      |
| 3 PMOS | 16.1X                | 13.7X     |
| 4 PMOS | 23.1X                | 18.7X     |

Narendra, ISLPED'01

#### Stack Forcing – Gate replacement



#### Tradeoffs:

- W/2 1/3 of drive current, same loading
- 1.5W 3x loading, same drive current

Narendra, ISLPED'01



Lowering Leakage: Sleep Mode

## Power / Energy Optimization Space

|         | Constant Throughput/Latency                                                      |              | Variable Throughput/Latency                                        |                            |
|---------|----------------------------------------------------------------------------------|--------------|--------------------------------------------------------------------|----------------------------|
| Energy  | Design Time                                                                      | Slee         | p Mode                                                             | Run Time                   |
| Active  | Logic design<br>Scaled V <sub>DD</sub><br>Trans. sizing<br>Multi-V <sub>DD</sub> | Clock        | <b>c</b> gating                                                    | DFS, DVS                   |
| Leakage | Stack effects Trans sizing Scaling V <sub>DD</sub> + Multi-V <sub>Th</sub>       | Mul<br>Varia | ep T's<br>Iti-V <sub>DD</sub><br>Ible V <sub>Th</sub><br>t control | + Variable V <sub>Th</sub> |

#### **DVFS** vs Gating





Software Impact to Platform Energy-Efficiency - Intel 2011

The more resources are turned-off, the longer it takes to turn back-on and the more transition energy is spent

#### Putting the processor to sleep during idle events



Power-gating overheads (energy cost, delay) need to be less
 than the leakage savings to make it worthwhile

#### **Gating Sequences**



#### • Sequence of steps:

- Gate clock
- Isolate inputs
- Save (scan out)
- Reset
- Gate power

#### Hierarchical Power Gating





| Cache | CPU   | MAC | VFP | Power State                          |
|-------|-------|-----|-----|--------------------------------------|
| (OFF) | (OFF) | -   | -   | Shutdown (Cache cleaned, VDDCPU off) |
| ON    | OFF   | -   | -   | Deep Sleep (Cache preserved)         |
| ON    | ON    | OFF | OFF | Normal Operation                     |
| ON    | ON    | ON  | OFF | DSP workload                         |
| ON    | ON    | OFF | ON  | Graphics workload                    |
| ON    | ON    | ON  | ON  | Intensive multimedia mode            |

| Cache | CPU   | MAC   | VFP   | Power State                          |
|-------|-------|-------|-------|--------------------------------------|
| (OFF) | (OFF) | (OFF) | (OFF) | Shutdown (Cache cleaned, VDDCPU off) |
| ON    | OFF   | OFF   | OFF   | Deep Sleep (Cache preserved)         |
| ON    | ON    | OFF   | OFF   | Normal Operation                     |
| ON    | ON    | ON    | OFF   | DSP workload                         |
| ON    | ON    | OFF   | ON    | Graphics workload                    |
| ON    | ON    | ON    | ON    | Intensive multimedia mode            |

#### Power Gating with Sleep Transistors

- Key components:
  - Power gates (& controller)
    - Leakage vs size
    - Switched capacitance
  - Slew-rate/rush current
  - State preservation
  - Energy overhead of sleep/wake-up transitions

#### How to Size the Sleep Transistor?

- Don't need both header and footer
- Circuits in active mode see the sleep transistor as extra power line resistance
  - The wider the sleep transistor, the better
- Wide sleep transistors cost area and are slow to turn on/off
  - Minimize the size of the sleep transistor for given ripple (e.g. 5%)
- Need to find the worst-case vector
- Sleep transistor is not for free it will degrade the performance in active mode
- Charging and discharging the virtual rails costs energy
- Need to sequentially wake up

#### **Sleep Transistor**

- High-VTH transistor (many in parallel) has to be very large for low resistance in linear region
- Low-VTH transistor needs much less area for the same resistance

|                               | MTCMOS | Boosted<br>Sleep | Non-<br>Boosted<br>Sleep |
|-------------------------------|--------|------------------|--------------------------|
| Sleep-TR size                 | 5.1%   | 2.3%             | 3.2%                     |
| Leakage<br>power<br>reduction | 1450X  | 3130X            | 11.5X                    |
| Virtual supply bounce         | 60 mV  | 59 mV            | 58 mV                    |

Courtesy: R. Krishnamurthy, Intel

#### Sleep Transistor Layout



Sleep transistor cells

| Area overhead |    |  |
|---------------|----|--|
| PMOS          | 6% |  |
| NMOS          | 3% |  |

Tschanz, ISSCC'03

#### Sleep in Standard Cells





#### Sleep Transistor Grid

#### No sleep transistor

## PMOS & NMOS sleep transistors



#### **Power Gating**

No power gating

"Ideal" power gating transient

Realistic profile

Keating, et al, Low Power Methodology Manual, 2009.



#### **Preserving State**

- Virtual supply collapse in sleep mode will cause the loss of state in registers
- Putting the registers at nominal VDD would preserve the state
  - These registers leak
  - The second supply needs to be routed as well
- Can lower VDD in sleep
  - Some impact on robustness, noise and SEU immunity
- State preservation and recovery

#### Scan-Based Retention

Scan-out/scan-in state to preserve/restore state



Keating, et al, Low Power Methodology Manual, 2009.

#### Retention Register Design



[Mutoh95]

#### Summary

- Clock Gating
- Multiple thresholds
- Longer channels
- Sleep modes

#### Next Lecture

Clock generation and distribution

