inst.eecs.berkeley.edu/~ee241b

# **EE241B : Advanced Digital Circuits**

# Lecture 23 – Sleep Modes Borivoje Nikolić

# C O M P U T I N G

### Wave Computing and MIPS Wave Goodbye

by Mike Gianfagna on 04-19-2020 at 8:00 Word on the virtual street is that Wave Computing is closing down. The company has reportedly let all employees go and filed for Chapter 11. As one of the many promising new companies in the field of AI, Wave Computing was founded in 2008 with the mission "to revolutionize deep learning with real-time AI solutions that scale from the edge to the datacenter."

https://www.semiwiki.com



### Announcements

• Assignment 4 due on Friday.

- Reading
  - Rabaey, LPDE, Chapter 8





### Outline

- Module 5
  - Sleep modes
  - Optimal thresholds and supplies















































# 5.J Lowering Leakage During Design: Transistor Stacking





# Power / Energy Optimization Space

|         | Constant Throughput/Latency                                                      |                               | Variable Th                                                        | roughput/La |
|---------|----------------------------------------------------------------------------------|-------------------------------|--------------------------------------------------------------------|-------------|
| Energy  | Design Time Sleep                                                                |                               | p Mode                                                             | Run Ti      |
| Active  | Logic design<br>Scaled V <sub>DD</sub><br>Trans. sizing<br>Multi-V <sub>DD</sub> | Clock gating                  |                                                                    | DFS, D      |
| Leakage | Stack effectsTrans sizingScaling V+ Multi-VTh                                    | Sle<br>Mul<br>Varia<br>+ Inpu | ep T's<br>Iti-V <sub>DD</sub><br>Ible V <sub>Th</sub><br>t control | + Variab    |







### Narendra, ISLPED'01



 $V_{dd}$ 

### X Reduction (in 0.13µ):

|        | High $V_t$ | Low V <sub>t</sub> |
|--------|------------|--------------------|
| 2 NMOS | 10.7X      | 9.96X              |
| 3 NMOS | 21.1X      | 18.8X              |
| 4 NMOS | 31.5X      | 26.7X              |
| 2 PMOS | 8.6X       | 7.9X               |
| 3 PMOS | 16.1X      | 13.7X              |
| 4 PMOS | 23.1X      | 18.7X              |

### Stack Effect

(~100 in recent vorccesses. 6







# 5.L Power Gating

5



# Power / Energy Optimization Space

|         | Constant Throughput/Latency                                                         |                                                                                   | Variable Throughput/La |           |
|---------|-------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|------------------------|-----------|
| Energy  | Design Time Sleep                                                                   |                                                                                   | p Mode                 | Run Ti    |
| Active  | Logic design<br>Scaled V <sub>DD</sub><br>Trans. sizing<br>Multi-V <sub>DD</sub>    | Clock gating                                                                      |                        | DFS, D    |
| Leakage | Stack effects<br>Trans sizing<br>Scaling V <sub>DD</sub><br>+ Multi-V <sub>Th</sub> | Sleep T's<br>Multi-V <sub>DD</sub><br>Variable V <sub>Th</sub><br>+ Input control |                        | + Variabl |





### Power Gating with Sleep Transistors

- Key components:
  - Power gates (& controller)
    - Leakage vs size
    - Switched cap
  - Slew-rate/rush current
  - State preservation
  - Energy overhead of sleep/wake-up transitions

|      | VDD     |
|------|---------|
| PEN  | -d - 57 |
| Cint |         |
|      | 310Ck - |
|      |         |
|      |         |



### How to Size the Sleep Transistor?

- adad ad ag
- Don't need both header and footer
- Circuits in active mode see the sleep transistor as extra power1 line resistance
  - The wider the sleep transistor, the better
- Wide sleep transistors cost area and are slow to turn on/off
  - Minimize the size of the sleep transistor for given ripple (e.g. 5%)
- Need to find the worst case vector
- Sleep transistor is not for free it will degrade the performance in active mode
- Charging and discharging the virtual rails costs energy
- Need to sequentially wake up





0.14

### **Sleep Transistor**

High-VTH transistor (many in parallel) has to be very large for low resistance in linear region. Low-VTH transistor needs much less area for the same resistance.

|                               |        | 6                | ۔<br>کلا                 | / - |
|-------------------------------|--------|------------------|--------------------------|-----|
|                               | MTCMOS | Boosted<br>Sleep | Non-<br>Boosted<br>Sleep |     |
| Sleep-TR size                 | 5.1%   | 2.3%             | 3.2%                     |     |
| Leakage<br>power<br>reduction | 1450X  | 3130X            | 11.5X                    |     |
| Virtual supply bounce         | 60 mV  | 59 mV            | 58 mV                    |     |





### $1.2V = V_{DD}$



### Sleep Transistor Layout



Tschanz, ISSCC'03





### Sleep in Standard Cells



Uvieghara, ISSCC'04







### **Preserving State**

- Virtual supply collapse in sleep mode will cause the loss of state in registers
- Putting the registers at nominal VDD would preserve the state
  - These registers leak
  - The second supply needs to be routed as well
- Can lower VDD in sleep
  - Some impact on robustness, noise and SEU immunity
- State preservation and recovery









### Scan-Based Retention

• Scan-out/scan-in state to preserve/restore state



Keating, et al, Low Power Methodology Manual, 2009.











### **Hierarchical Power Gating**





| Cache | CPU   | MAC | VFP | Power State                          |
|-------|-------|-----|-----|--------------------------------------|
| (OFF) | (OFF) | -   | -   | Shutdown (Cache cleaned, VDDCPU off) |
| ON    | OFF   | -   | -   | Deep Sleep (Cache preserved)         |
| ON    | ON    | OFF | OFF | Normal Operation                     |
| ON    | ON    | ON  | OFF | DSP workload                         |
| ON    | ON    | OFF | ON  | Graphics workload                    |
| ON    | ON    | ON  | ON  | Intensive multimedia mode            |

| Cache | CPU   | MAC   | VFP   | Power State            |
|-------|-------|-------|-------|------------------------|
| (OFF) | (OFF) | (OFF) | (OFF) | Shutdown (Cache clea   |
| ON    | OFF   | OFF   | OFF   | Deep Sleep (Cache pre  |
| ON    | ON    | OFF   | OFF   | Normal Operation       |
| ON    | ON    | ON    | OFF   | DSP workload           |
| ON    | ON    | OFF   | ON    | Graphics workload      |
| ON    | ON    | ON    | ON    | Intensive multimedia m |

Keating, et al, Low Power Methodology Manual, 2009.

aned, VDDCPU off) reserved)

node



### Project reports

• Due May 4, up to 6 pages

- Presentations on May 4 in the afternoon
  - 12min + 3 min Q&A (10-12 content stides)
  - 15min for 3-person teams

Title Autas Abstract (5 sendences) M VO State of the ort Anolys Test bud





# 5.M Dynamic Threshold Scaling





### Power /Energy Optimization Space

|         | Constant Throughput/Latency                                                         |                                                                                   | Variable Th | roughput/Lat                  |
|---------|-------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|-------------|-------------------------------|
| Energy  | Design Time Sleep                                                                   |                                                                                   | p Mode      | Run Tir                       |
| Active  | Logic design<br>Scaled V <sub>DD</sub><br>Trans. sizing<br>Multi-V <sub>DD</sub>    | Clock gating                                                                      |             | DFS, D                        |
| Leakage | Stack effects<br>Trans sizing<br>Scaling V <sub>DD</sub><br>+ Multi-V <sub>Th</sub> | Sleep T's<br>Multi-V <sub>DD</sub><br>Variable V <sub>Th</sub><br>+ Input control |             | DVS<<br><mark>Variable</mark> |







### **Dynamic Body Bias**

- Similar concept to dynamic voltage scaling
- Control loop adjusts the substrate bias to meet the timing/leakage goal
  - Can be used just as runtime/sleep
- Limited range of threshold adjustments in bulk (<100mV)
  - Limited leakage reduction (<10x)</li>
- Works well in FDSOI (80-85mV/V, with  $\sim 1.8V$  range)
- No delay penalty
  - Can increase speed by forward bias
- Energy cost of charging/discharging the substrate capacitance
  - but doesn't need a regulator



25

### FDSOI and Bulk



- Bulk CMOS
  - Leakage paths through bulk
  - RDF dominates local variability
  - Diodes and B2B tunneling limit back-bias range





- > UTBB FD-SOI
  - Thin body for shortchannel control
  - No doping less RDF
  - Extended back-bias range



### **FDSOI** Wells and Back Bias



- Typical (RVT)
  - $\mathsf{GND}_{\mathsf{S'nom}} = \mathsf{OV}, \mathsf{V}_{\mathsf{DDS, nom}} = \mathsf{V}_{\mathsf{DD}}$
  - > Reverse body bias,  $V_{BSN} < 0V$
  - > (-3V) <  $GND_{S} < V_{DD}/2+0.3V$ 
    - Limit due to diodes, BOX
  - Can reverse bias 2-3V each



- Flip-well (LVT) •  $V_{DDS, nom} = GND_{S, nom} = 0V$ 
  - Forward body bias  $V_{BSN} > 0V$
  - $0.3V < GND_{s} < (3V)$ 
    - Limit due to diodes, BOX
  - Can forward bias 2-3V each

EECS241B L23 SLEEP







27

Back-Bias in FDSOI

EECS241B L23 SLEEP



 $\gamma = 85 \text{mV/V}$  body coefficient, and extended voltage range

• Lower coefficient and voltage range in bulk, finFET

D. Jacquet, JSSC 4/14







- No channel implant in 28FDSOI
  - No multi  $V_{Th}$

 $\square$ 

Multi V<sub>Th</sub>

• Can't abut wells

EECS241B L23 SLEEP

• RVT and LVT require different well biases

D. Jacquet, JSSC 4/14





# Back Bias in FDSOI



- Triple well (deep N-Well, DNW) allows for separate back bias
- Layout penalty; capacitance to drive





### **Digital Logic: UPF**

- Supply, back-bias defined in Universal Power Format (UPF)
  - Or Common Power Format (CPF)
- Handled by synthesis, place and route tools

EECS241B L23 SLEEP

( paner

UPF description of PT TOP with GND, VDD, GNDS and VDDS supplies. create\_power\_domain PD\_TOP

create\_supply\_port GND create supply port VDD create\_supply\_net GND -domain PD\_TOP connect\_supply\_net GND -ports {GND} create supply net VDD -domain PD TOP connect\_supply\_net VDD -ports { VDD }

set\_domain\_supply\_net PD\_TOP -primary\_power\_net VDD -primary\_ground\_net GND



**# Body-bias specification** create supply port VDDS create\_supply\_port GNDS create\_supply\_net VDDS -domain PD\_TOP connect\_supply\_net VDDS -ports { VDDS vddgndvdds\*/VDDSCORE } create\_supply\_net GNDS -domain PD\_TOP connect\_supply\_net GNDS -ports { GNDS gnds\*/VDDCORE1V8 }

create supply set back bias set \ -function {nwell VDDS} \ -function {pwell GNDS} \ -reference gnd {GND} \

create power domain PD TOP -update -supply bias associate\_supply\_set back\_bias\_set -handle PD TOP.bias

M.Blagojevic, Ph.D. Dissertation, ISEP 2017

31

### **Digital Logic - Implementation**

- Well taps added explicitly
  - Difference from bulk



### **Back bias straps**

- Low DC current
- > Except for very fast transitions



32