inst.eecs.berkeley.edu/~ee241b

## **EE241B : Advanced Digital Circuits**

## Lecture 23 – Sleep Modes Borivoje Nikolić

## C O M P U T I N G

### Wave Computing and MIPS Wave Goodbye

by Mike Gianfagna on 04-19-2020 at 8:00 Word on the virtual street is that Wave Computing is closing down. The company has reportedly let all employees go and filed for Chapter 11. As one of the many promising new companies in the field of AI, Wave Computing was founded in 2008 with the mission "to revolutionize deep learning with real-time AI solutions that scale from the edge to the datacenter."

https://www.semiwiki.com



## Announcements

• Assignment 4 due on Friday.

- Reading
  - Rabaey, LPDE, Chapter 8





## Outline

- Module 5
  - Sleep modes
  - Optimal thresholds and supplies





## 5.J Lowering Leakage During Design: Transistor Stacking





## Power / Energy Optimization Space

|         | Constant Throughput/Latency                                                      |                               | Variable Th                                                        | roughput/La |
|---------|----------------------------------------------------------------------------------|-------------------------------|--------------------------------------------------------------------|-------------|
| Energy  | Design Time                                                                      | Slee                          | p Mode                                                             | Run Ti      |
| Active  | Logic design<br>Scaled V <sub>DD</sub><br>Trans. sizing<br>Multi-V <sub>DD</sub> | Cloci                         | < gating                                                           | DFS, D      |
| Leakage | Stack effectsTrans sizingScaling V+ Multi-VTh                                    | Sle<br>Mul<br>Varia<br>+ Inpu | ep T's<br>Iti-V <sub>DD</sub><br>Ible V <sub>Th</sub><br>t control | + Variab    |





Narendra, ISLPED'01

EECS241B L23 SLEEP

## $V_{dd}$ I<sub>device</sub>



|        | High V <sub>t</sub> | Low V <sub>t</sub> |
|--------|---------------------|--------------------|
| 2 NMOS | 10.7X               | 9.96X              |
| 3 NMOS | 21.1X               | 18.8X              |
| 4 NMOS | 31.5X               | 26.7X              |
| 2 PMOS | 8.6X                | 7.9X               |
| 3 PMOS | 16.1X               | 13.7X              |
| 4 PMOS | 23.1X               | 18.7X              |



 $V_{dd}$ 

## Stack Effect



## Stack Forcing



### Tradeoffs:

- W/2 1/3 of drive current, same loading
- 1.5W 3x loading, same drive current

Narendra, ISLPED'01





## 5.L Power Gating



## Power / Energy Optimization Space

|         | Constant Throughput/Latency                                                         |                              | Variable Th                                                        | roughput/La |
|---------|-------------------------------------------------------------------------------------|------------------------------|--------------------------------------------------------------------|-------------|
| Energy  | Design Time                                                                         | Slee                         | p Mode                                                             | Run Ti      |
| Active  | Logic design<br>Scaled V <sub>DD</sub><br>Trans. sizing<br>Multi-V <sub>DD</sub>    | Cloc                         | k gating                                                           | DFS, D      |
| Leakage | Stack effects<br>Trans sizing<br>Scaling V <sub>DD</sub><br>+ Multi-V <sub>Th</sub> | Sle<br>Mu<br>Varia<br>+ Inpu | ep T's<br>Iti-V <sub>DD</sub><br>Ible V <sub>Th</sub><br>t control | + Variabl   |





## Power Gating with Sleep Transistors

- Key components:
  - Power gates (& controller)
    - Leakage vs size
    - Switched cap
  - Slew-rate/rush current
  - State preservation
  - Energy overhead of sleep/wake-up transitions



## How to Size the Sleep Transistor?

- Don't need both header and footer
- Circuits in active mode see the sleep transistor as extra power line resistance
  - The wider the sleep transistor, the better
- Wide sleep transistors cost area and are slow to turn on/off
  - Minimize the size of the sleep transistor for given ripple (e.g. 5%)
- Need to find the worst case vector
- Sleep transistor is not for free it will degrade the performance in active mode
- Charging and discharging the virtual rails costs energy
- Need to sequentially wake up





## **Sleep Transistor**

High-VTH transistor (many in parallel) has to be very large for low resistance in linear region. Low-VTH transistor needs much less area for the same resistance.

|                               | MTCMOS | Boosted | Non-    |
|-------------------------------|--------|---------|---------|
|                               |        | Sleep   | Boosted |
|                               |        |         | Sleep   |
| Sleep-TR size                 | 5.1%   | 2.3%    | 3.2%    |
| Leakage<br>power<br>reduction | 1450X  | 3130X   | 11.5X   |
| Virtual supply bounce         | 60 mV  | 59 mV   | 58 mV   |

Courtesy: R. Krishnamurthy, Intel









## Sleep in Standard Cells





Uvieghara, ISSCC'04







## Preserving State

- Virtual supply collapse in sleep mode will cause the loss of state in registers
- Putting the registers at nominal VDD would preserve the state
  - These registers leak
  - The second supply needs to be routed as well
- Can lower VDD in sleep
  - Some impact on robustness, noise and SEU immunity
- State preservation and recovery





## Scan-Based Retention

• Scan-out/scan-in state to preserve/restore state



Keating, et al, Low Power Methodology Manual, 2009.







## Gating Sequences



- Sequence of steps:
  - Gate clock
  - Isolate inputs
  - Save (scan out)
  - Reset
  - Gate power

Keating, et al, Low Power Methodology Manual, 2009.



## **Hierarchical Power Gating**





| Cache | CPU   | MAC | VFP | Power State                          |
|-------|-------|-----|-----|--------------------------------------|
| (OFF) | (OFF) | -   | -   | Shutdown (Cache cleaned, VDDCPU off) |
| ON    | OFF   | -   | -   | Deep Sleep (Cache preserved)         |
| ON    | ON    | OFF | OFF | Normal Operation                     |
| ON    | ON    | ON  | OFF | DSP workload                         |
| ON    | ON    | OFF | ON  | Graphics workload                    |
| ON    | ON    | ON  | ON  | Intensive multimedia mode            |

| Cache | CPU   | MAC   | VFP   | Power State            |
|-------|-------|-------|-------|------------------------|
| (OFF) | (OFF) | (OFF) | (OFF) | Shutdown (Cache clea   |
| ON    | OFF   | OFF   | OFF   | Deep Sleep (Cache pre  |
| ON    | ON    | OFF   | OFF   | Normal Operation       |
| ON    | ON    | ON    | OFF   | DSP workload           |
| ON    | ON    | OFF   | ON    | Graphics workload      |
| ON    | ON    | ON    | ON    | Intensive multimedia m |

Keating, et al, Low Power Methodology Manual, 2009.

aned, VDDCPU off) reserved)

node



## Project reports

- Due May 4, up to 6 pages
- Presentations on May 4 in the afternoon
  - 12min + 3 min Q&A
  - 15min for 3-person teams





## 5.M Dynamic Threshold Scaling





## Power /Energy Optimization Space

|         | Constant Throughput/Latency                                                         |                              | Variable Th                                                                     | roughput/Lat                  |
|---------|-------------------------------------------------------------------------------------|------------------------------|---------------------------------------------------------------------------------|-------------------------------|
| Energy  | Design Time                                                                         | Slee                         | p Mode                                                                          | Run Tir                       |
| Active  | Logic design<br>Scaled V <sub>DD</sub><br>Trans. sizing<br>Multi-V <sub>DD</sub>    | Clock                        | c gating                                                                        | DFS, D                        |
| Leakage | Stack effects<br>Trans sizing<br>Scaling V <sub>DD</sub><br>+ Multi-V <sub>Th</sub> | Sle<br>Mu<br>Varia<br>+ Inpu | ep T's<br>Iti-V <sub>DD</sub><br>I <mark>ble V<sub>Th</sub></mark><br>t control | DVS<<br><mark>Variable</mark> |







## **Dynamic Body Bias**

- Similar concept to dynamic voltage scaling
- Control loop adjusts the substrate bias to meet the timing/leakage goal
  - Can be used just as runtime/sleep
- Limited range of threshold adjustments in bulk (<100mV)
  - Limited leakage reduction (<10x)</li>
- Works well in FDSOI (80-85mV/V, with  $\sim 1.8V$  range)
- No delay penalty
  - Can increase speed by forward bias
- Energy cost of charging/discharging the substrate capacitance
  - but doesn't need a regulator



25

## FDSOI and Bulk



NMOS PMOS GND<sub>S</sub> S D D S V<sub>PDS</sub> BO BOX P<sup>2</sup>Well N-Well P-Sub

- Bulk CMOS
  - Leakage paths through bulk
  - RDF dominates local variability
  - Diodes and B2B tunneling limit back-bias range

- > UTBB FD-SOI
  - Thin body for shortchannel control
  - No doping less RDF
  - Extended back-bias range



## **FDSOI** Wells and Back Bias



- Typical (RVT)
  - $\mathsf{GND}_{\mathsf{S'nom}} = \mathsf{OV}, \mathsf{V}_{\mathsf{DDS, nom}} = \mathsf{V}_{\mathsf{DD}}$
  - > Reverse body bias,  $V_{BSN} < 0V$
  - > (-3V) <  $GND_{S} < V_{DD}/2+0.3V$ 
    - Limit due to diodes, BOX
  - Can reverse bias 2-3V each



- Flip-well (LVT)
  - $V_{DDS, nom} = GND_{S.nom} = 0V$
  - Forward body bias  $V_{BSN} > 0V$
  - $0.3V < GND_{s} < (3V)$ 
    - Limit due to diodes, BOX
  - Can forward bias 2-3V each

EECS241B L23 SLEEP







27

**Back-Bias in FDSOI** 

EECS241B L23 SLEEP



•  $\gamma = 85 \text{mV/V}$  body coefficient, and extended voltage range

• Lower coefficient and voltage range in bulk, finFET

D. Jacquet, JSSC 4/14







- No channel implant in 28FDSOI
  - No multi  $V_{Th}$

 $\square$ 

Multi V<sub>Th</sub>

Can't abut wells

EECS241B L23 SLEEP

• RVT and LVT require different well biases

D. Jacquet, JSSC 4/14







- Triple well (deep N-Well, DNW) allows for separate back bias
- Layout penalty; capacitance to drive

Back Bias in FDSOI





## **Digital Logic: UPF**

- Supply, back-bias defined in Universal Power Format (UPF)
  - Or Common Power Format (CPF)
- Handled by synthesis, place and route tools

EECS241B L23 SLEEP

UPF description of PT TOP with GND, VDD, GNDS and VDDS supplies. create\_power\_domain PD\_TOP

create\_supply\_port GND create supply port VDD create\_supply\_net GND -domain PD\_TOP connect\_supply\_net GND -ports {GND} create supply net VDD -domain PD TOP connect\_supply\_net VDD -ports { VDD }

set\_domain\_supply\_net PD\_TOP -primary\_power\_net VDD -primary\_ground\_net GND

**# Body-bias specification** create supply port VDDS create\_supply\_port GNDS create\_supply\_net VDDS -domain PD\_TOP connect\_supply\_net VDDS -ports { VDDS vddgndvdds\*/VDDSCORE } create\_supply\_net GNDS -domain PD\_TOP connect\_supply\_net GNDS -ports { GNDS gnds\*/VDDCORE1V8 }

create supply set back bias set \ -function {nwell VDDS} \ -function {pwell GNDS} \ -reference gnd {GND} \

create power domain PD TOP -update -supply bias associate\_supply\_set back\_bias\_set -handle PD TOP.bias

M.Blagojevic, Ph.D. Dissertation, ISEP 2017



## **Digital Logic - Implementation**

- Well taps added explicitly
  - Difference from bulk



### **Back bias straps**

- Low DC current
- > Except for very fast transitions



32

Dynamic Body Bias (Bulk)



EECS241B L23 SLEEP



33

## Dynamic Body Bias (Bulk)



**Active mode** 

Forward body bias (FBB) Local V<sub>CC</sub> tracking

Idle mode

**Reverse body bias** (RBB) **Triple well needed** 

Tschanz, ISSCC'03



## Body Bias Layout

| Sleep transistor LBGs ALU core LBGs        |    |
|--------------------------------------------|----|
| Number of ALU<br>core LBGs                 |    |
| Number of sleep transistor<br>LBGs         |    |
| PMOS device width                          | 13 |
| Area overhead                              | 8  |
| → ALU core LBGs<br>→ Sleep transistor LBGs |    |





## **Total Active Power Savings**

(Fixed activity:  $\alpha = 0.05$ )



Reference: 450mV FBB to core with clock gating, 1.28V, 4.05GHz, 75°C

EECS241B L23 SLEEP

## 1000000



## **Generating Back-Bias**

- Tradeoff speed of charging and discharging well caps
- Often measure V<sub>BB</sub> indirectly (leakage)
- Challenge: Generating  $-V_{SS}$
- 28nm FDSOI implementation







## **Generating Back Bias**

Fast and wide voltage range back-bias in FDSOI



Switched capacitors generate negative bias and pump substrate







Supply/Process Compensation



• Able to track ~200mV supply droops and maintain constant frequency (measured by a replica) by back-bias adjustments









## 5.N Dynamic Threshold Scaling and Variations





## **Body Biasing and Variations**

- Body biasing with a local control loop can be used to lower the impact of process variations
- Used to limit die-to-die and within-die variations

# the impact of



## Self-Adjusting Threshold-Voltage Scheme (SATS)

Older bulk technologies had stronger body effect



low Vth  $\rightarrow$  large leakage  $\rightarrow$  SSB ON  $\rightarrow$  deepVBB  $\rightarrow$  high Vth

high Vth  $\rightarrow$  little leakage  $\rightarrow$  SSB OFF  $\rightarrow$  shallow VBB  $\rightarrow$  low Vth



- control Vth to adjust leakage current
- compensate Vth fluctuation





## **Dynamic Frequency Loop in FDSOI**



Quelen, ISSCC'18





EECS241B L23 SLEEP

Tschanz, JSSC 11/02





## Effectiveness of Substrate Bias

### **Die-to-die variations**



- NBB: No body bias
- ABB: Adaptive body bias



## **Effectiveness of Substrate Bias**

## Within-die variations



• ABB with multiple within die (WID) sensors



## Techniques Summary (around 130nm node)



## Power / Energy Optimization Space

|         | Constant Throughput/Latency                                                         |                              | Variable Th                                                        | roughput/La |
|---------|-------------------------------------------------------------------------------------|------------------------------|--------------------------------------------------------------------|-------------|
| Energy  | Design Time                                                                         | Slee                         | p Mode                                                             | Run Ti      |
| Active  | Logic design<br>Scaled V <sub>DD</sub><br>Trans. sizing<br>Multi-V <sub>DD</sub>    | Cloc                         | k gating                                                           | DFS, D      |
| Leakage | Stack effects<br>Trans sizing<br>Scaling V <sub>DD</sub><br>+ Multi-V <sub>Th</sub> | Sle<br>Mu<br>Varia<br>+ Inpu | ep T's<br>Iti-V <sub>DD</sub><br>Ible V <sub>Th</sub><br>t control | + Variabl   |







## 5.0 Optimal $V_{DD}$ , $V_{Th}$



## **Dynamic Voltage Scaled Microprocessor**







Adapting  $V_{DD}$  and  $V_{TH}$ 140 120 Dynamic 100Voltage Power (µW) Scaling 80 60 Adaptive 40 Supply and **Body bias** 20 0 50 60 30 0 10 20 40 Frequency (MHz)

Miyazaki, ISSCC'02



## Optimal $V_{DD}$ , $V_{Th}$

- Adjusting  $V_{DD}$ ,  $V_{Th}$  trades of energy and delay
- We studied energy-limited design
  - And alternate ways for optimizing energy and delay together
  - E.g. energy-delay product (EDP)
  - Or  $E^{n}D^{m}$ , n,m > 1



## **Optimal EDP Contours**

• Plot of EDP curves in  $V_{DD}$ ,  $V_{Th}$  plane



Gonzalez, JSSC 8/97



Sizing, Supply, Threshold Optimization

| <b>Reference Design:</b>                                                           | Topology                | Inverter | Adder | De |
|------------------------------------------------------------------------------------|-------------------------|----------|-------|----|
| D <sup>ref</sup> (V <sub>dd</sub> <sup>max</sup> ,V <sub>th</sub> <sup>ref</sup> ) | $(E_{Lk}/E_{Sw})^{ref}$ | 0.1%     | 1%    | 1  |

Large variation in optimal circuit parameters V<sub>dd</sub><sup>opt</sup>, V<sub>th</sub><sup>opt</sup>, w<sup>opt</sup>



**Technology parameters (V<sub>dd</sub><sup>max</sup>, V<sub>th</sub><sup>ref</sup>) rarely optimal** 







## **Result: E-D Tradeoff in an Adder**





## Energy-constrained delay

• Active power

$$P_{act} = \alpha f C V_{DD}^{2}$$

$$f = 1/L_D t_p$$

• Leakage power 
$$P_{leak} = I_0 e^{\frac{-V_{Th} - \gamma V_{DD}}{S}} V_{DD}$$

• Eliminate one variable( $V_{Th}$ ) and find  $P_{min}(V_{DD})$ 





Minimum energy:  $E_{Sw} = 2_{ELk}$ 



♦ Large (E<sub>Lk</sub>/E<sub>Sw</sub>)<sup>opt</sup>

- ♦ Flat E<sub>Op</sub> minimum
- Topology dependent



## **Optimal designs have high leakage (E\_{Lk}/E\_{Sw} \approx 0.5)**

EECS241B L23 SLEEP

## opt um ndent





## Subthreshold Optimum





f = 30kHz

Minimum is independent of  $V_{T}$ 

Calhoun, JSSC 9/05





## Next Lecture

- We finished low-power design
- Next is clocks and supplies

