inst.eecs.berkeley.edu/~ee241b

# **EE241B : Advanced Digital Circuits**

# Lecture 22 – Reducing Leakage **Borivoje Nikolić**



#### Sweetfarm.org/goat-2-meeting:

Invite a goat or a llama to a zoom meeting

https://www.sweetfarm.org/goat-2-meeting



# Announcements

• Assignment 4 due next Friday.

- Reading
  - Rabaey, LPDE, Chapter 8





# Outline

- Module 5
  - Clock gating
  - Leakage reduction during design time and runtime





# 5.G Reducing Switching Activity Through Logic Design





# Power /Energy Optimization Space

|         | Constant Throughput/Latency                                                         |       | Variable Th                                | roughput/Late |
|---------|-------------------------------------------------------------------------------------|-------|--------------------------------------------|---------------|
| Energy  | Design Time                                                                         | Slee  | p Mode                                     | Run Tin       |
| Active  | Logic design<br>Scaled V <sub>DD</sub><br>Trans. sizing<br>Multi-V <sub>DD</sub>    | Clock | c gating                                   | DFS, D\       |
| Leakage | Stack effects<br>Trans sizing<br>Scaling V <sub>DD</sub><br>+ Multi-V <sub>Th</sub> |       | ep T's<br>Variable V <sub>Th</sub><br>trol | + Variable    |







# **Basic Idea**

- $E \sim \alpha C V^2$
- Reduce switching activity,  $\alpha$ , through logic and architectural transformations
- Many options
  - Switching activity lower with deeper logic
  - Pipelining has significant effect
  - Reduce the number of clocked devices in a flip-flop
    - e.g. group generation of clk\_b
  - A few logic ideas follow



# **Circuit-Level Activity Encoding**



from [Stan94] (1994 International Workshop on Low-power Design)







Number Representation

Input signals are noise most of the time



• Sign-extension activity significantly reduced using sign-magnitude representation





# 5.H Clock Gating



# Power /Energy Optimization Space

|         | Constant Throughput/Latency                                                         |                                                                                | Variable Th | roughput/Late    |
|---------|-------------------------------------------------------------------------------------|--------------------------------------------------------------------------------|-------------|------------------|
| Energy  | Design Time                                                                         | Slee                                                                           | p Mode      | Run Tin          |
| Active  | Logic design<br>Scaled V <sub>DD</sub><br>Trans. sizing<br>Multi-V <sub>DD</sub>    | <b>Clock gating</b>                                                            |             | DFS, D\          |
| Leakage | Stack effects<br>Trans sizing<br>Scaling V <sub>DD</sub><br>+ Multi-V <sub>Th</sub> | Sleep T's<br>Multi-V <sub>DD</sub> Variable V <sub>Th</sub><br>+ Input control |             | DVS,<br>Variable |











В

# Clock Gating

• Enabling clock needs to be synchronized

Sequential cell





# **Clock Gating Efficiently Reduces Power**

## Without clock gating



90% of F/F's were clock-gated.

70% power reduction by clockgating alone.

Courtesy M. Ohashi, Matsushita, ISSCC 2002

EECS241B L22 LEAKAGE





896Kb SRAN

13

# **Clock Gating**

## ARM Cortex-A9 Technical Reference Manual:

#### Dynamic high level clock gating activity

When dynamic high level clock gating is enabled the clock of the integer core is cut in the following cases:

- the integer core is empty and there is an instruction miss causing a linefill
- the integer core is empty and there is an instruction TLB miss
- the integer core is full and there is a data miss causing a linefill
- the integer core is full and data stores are stalled because the linefill buffers are busy.

When dynamic clock gating is enabled, the clock of the system control block is cut in the following cases:

- there are no system control coprocessor instructions being executed
- there are no system control coprocessor instructions present in the pipeline
- performance events are not enabled
- debug is not enabled.

When dynamic clock gating is enabled, the clock of the data engine is cut when there is no data engine instruction in the data engine and no data engine instruction in the pipeline.







# Local Clock Gating





'Clock on demand' Flip-flop



# **Complex Designs**



EECS241B L22 LEAKAGE

### Fischer, ISSCC'05



# Power /Energy Optimization Space

|         | Constant Throughput/Latency                                                         |                                                                                | Variable Th | roughput/Late    |
|---------|-------------------------------------------------------------------------------------|--------------------------------------------------------------------------------|-------------|------------------|
| Energy  | Design Time                                                                         | Sleep Mode                                                                     |             | Run Tin          |
| Active  | Logic design<br>Scaled V <sub>DD</sub><br>Trans. sizing<br>Multi-V <sub>DD</sub>    | Clock gating                                                                   |             | DFS, DV          |
| Leakage | Stack effects<br>Trans sizing<br>Scaling V <sub>DD</sub><br>+ Multi-V <sub>Th</sub> | Sleep T's<br>Multi-V <sub>DD</sub> Variable V <sub>Th</sub><br>+ Input control |             | DVS,<br>Variable |







# Plan For the Rest of the Semester

- 4 more lectures:
  - Finish low power (2 lectures)
  - Supplies, clocks and their interaction
- Homework 4 due on April 24<sup>th</sup>
  - Quiz 4 on April 28<sup>th</sup>
- Final on April 30<sup>th</sup>
  - 80 minutes, open everything
- Final presentations, May 4
  - Final reports due on May 4





# 5.1 Lowering Leakage During Design: Multiple Thresholds





# Power /Energy Optimization Space

|         | Constant Throughput/Latency Design Time Slee                                        |                                                                                   | Variable Throughput/Late |                    |
|---------|-------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|--------------------------|--------------------|
| Energy  |                                                                                     |                                                                                   | p Mode                   | Run Tim            |
| Active  | Logic design<br>Scaled V <sub>DD</sub><br>Trans. sizing<br>Multi-V <sub>DD</sub>    | Clock gating                                                                      |                          | DFS, DV            |
| Leakage | Stack effects<br>Trans sizing<br>Scaling V <sub>DD</sub><br>+ Multi-V <sub>Th</sub> | Sleep T's<br>Multi-V <sub>DD</sub><br>Variable V <sub>Th</sub><br>+ Input control |                          | DVS,<br>Variable V |





# **Technology** Options

• Multiple thresholds, each spaced 50-100mV apart (5-10x less leakage)









# Using Multiple Thresholds

- Cell-by-cell V<sub>T</sub> assignment (not block level)
   Allows us to minimize leakage
- Achieves all-low-V performance



EECS241B L22 LEAKAGE





High V<sub>T</sub>

Low V<sub>T</sub>



# <sub>o</sub> Typical Technologies

- 2-3 Thresholds
  - To choose from 4-6 in a node
  - In bulk and finfet, but not in FDSOI (unless doped)
- Threshold voltage diff  $\sim$ 5-10x in leakage





# 5.1 Lowering Leakage During Design: Longer Channels





# Power /Energy Optimization Space

|         | Constant Throughput/Latency                                                         |                                                                                   | Variable Throughput/Late |                    |
|---------|-------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|--------------------------|--------------------|
| Energy  | Design Time                                                                         | Sleep Mode                                                                        |                          | Run Tim            |
| Active  | Logic design<br>Scaled V <sub>DD</sub><br>Trans. sizing<br>Multi-V <sub>DD</sub>    | Clock gating                                                                      |                          | DFS, DV            |
| Leakage | Stack effects<br>Trans sizing<br>Scaling V <sub>DD</sub><br>+ Multi-V <sub>Th</sub> | Sleep T's<br>Multi-V <sub>DD</sub><br>Variable V <sub>Th</sub><br>+ Input control |                          | DVS,<br>Variable V |





# Longer Channels



Attractive when don't have to increase W (memory)
Doubling L reduces leakage by 3x (in 0.13um)
Much stronger effect in 28nm!

•Effect improves with shorter channel devices

EECS241B L22 LEAKAGE



# 27

Poly Bias

• 28FDSOI example













# 5.J Lowering Leakage During Design: Transistor Stacking





# Power / Energy Optimization Space

|         | Constant Throughput/Latency                                                      |                                                                                   | Variable Throughput/Lat |            |
|---------|----------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|-------------------------|------------|
| Energy  | Design Time                                                                      | Sleep Mode                                                                        |                         | Run Tin    |
| Active  | Logic design<br>Scaled V <sub>DD</sub><br>Trans. sizing<br>Multi-V <sub>DD</sub> | Clock gating                                                                      |                         | DFS, D\    |
| Leakage | Stack effectsTrans sizingScaling VDD+ Multi-VTh                                  | Sleep T's<br>Multi-V <sub>DD</sub><br>Variable V <sub>Th</sub><br>+ Input control |                         | + Variable |





EECS241B L22 LEAKAGE

### Narendra, ISLPED'01



 $V_{dd}$ 

 $V_{dd}$ 

 $I_{stack-u}$ 

 $w_{\mu}$ 

### Reduction (in 0.13µ):

| -      | -                                    |                                                                                                                               |
|--------|--------------------------------------|-------------------------------------------------------------------------------------------------------------------------------|
|        | High V <sub>t</sub>                  | Low V <sub>t</sub>                                                                                                            |
| 2 NMOS | 10.7X                                | 9.96X                                                                                                                         |
| 3 NMOS | 21.1X                                | 18.8X                                                                                                                         |
| 4 NMOS | 31.5X                                | 26.7X                                                                                                                         |
| 2 PMOS | 8.6X                                 | 7.9X                                                                                                                          |
| 3 PMOS | 16.1X                                | 13.7X                                                                                                                         |
| 4 PMOS | 23.1X                                | 18.7X                                                                                                                         |
|        | 3 NMOS<br>4 NMOS<br>2 PMOS<br>3 PMOS | 2 NMOS       10.7X         3 NMOS       21.1X         4 NMOS       31.5X         2 PMOS       8.6X         3 PMOS       16.1X |

Stack Effect



# Stack Forcing



#### Tradeoffs:

- W/2 1/3 of drive current, same loading
- 1.5W 3x loading, same drive current

Narendra, ISLPED'01



# Next Lecture

- Low-power design
  - Power gating

- Dynamic thresholds
- Optimal supplies and thresholds

