# **EE241B: Advanced Digital Circuits**

# **Lecture 18 – Power-Performance Tradeoffs 2**

# Borivoje Nikolić

MarketWatch, March 28: Opinion: There's no returning to regular schooling as online learning goes mainstream, by Alex Hicks



When in-person education resumes, online learning tools and methods will be entrenched in the system



#### Announcements

- Project midterm reports due today, March 31
  - Please e-mail me the link to your web page
- Assignment 3 due Thursday, April 2.
  - Quiz next Tuesday
- Reading req'd
  - Rabaey et al, LPDE, Ch. 4

# Outline

- Module 5
  - Power-performance tradeoffs



# 5.B Power-Performance Tradeoffs

EECS241B L18 POWER-PERFORMANCE II

4

### **Know Your Enemy**

- Where does power go in CMOS?
- Switching (dynamic) power
  - Charging capacitors
- Leakage power
  - Transistors are imperfect switches
- Short-circuit power
  - Both pull-up and pull-down on during transition
- Static currents
  - Biasing currents

#### Summary of Power Dissipation Sources

$$P \sim \alpha \cdot (C_L + C_{CS}) \cdot V_{swing} \cdot V_{DD} \cdot f + (I_{DC} + I_{Leak}) \cdot V_{DD}$$

- $\alpha$  switching activity
- C<sub>1</sub> load capacitance
- C<sub>CS</sub> short-circuit "capacitance"
- V<sub>swing</sub> voltage swing
- *f* frequency

• 
$$I_{DC}$$
 – static current

I<sub>leak</sub> – leakage current

$$P = \frac{energy}{operation} \times rate + static power$$

### **CMOS** Performance Optimization

• Reminder - sizing: Optimal performance with equal fanout per stage



- Extendable to general logic cone through 'logical effort'
- Equal effective fanouts  $(g_iC_{i+1}/C_i)$  per stage
- Optimal fanout is around 4













Delay = 1/Performance

### The Design Abstraction Stack

A very rich set of design parameters to consider! It helps to consider options in relation to their abstraction layer





Achieve the highest performance under the power cap



Achieve the highest performance under the power cap



Achieve the highest performance under the power cap



How far away are we from the optimal solution?



Global optimum – best performance



Maximize throughput for given energy or Minimize energy for given throughput

- There are many sets of parameters to adjust
  - Tuning variables
  - Circuit(sizing, supply, threshold)
  - Logic style(std. cells, custom , ...)
  - Block topology
     (adder: CLA, CSA, ...)
  - Micro-architecture (parallel, pipelined)



- There are many sets of parameters to adjust
  - Tuning variables
  - Circuit (sizing, supply, threshold)
  - Logic style(std. cells, custom , ...)
  - Block topology
     (adder: CLA, CSA, ...)
  - Micro-architecture (parallel, pipelined)



Globally optimal power-performance curve for a given function

### **Energy-Delay Sensitivity**





### Solution: Equal Sensitivities

$$\Delta \mathbf{E} = \mathbf{S}_{\mathbf{A}} \cdot (-\Delta \mathbf{D}) + \mathbf{S}_{\mathbf{B}} \cdot \Delta \mathbf{D}$$

At the solution point all sensitivities should be equal

 $f(A_1,B)$ 

 $f(A_0,B)$ 

Delay



# 5. C Architectural Optimization

#### **Optimal Processors**

- Processors used to be optimized for performance
  - Optimal logic depth was found to be 8-11 FO4 delays in superscalar processors
  - 1.8-3 FO4 in sequentials, rest in combinatorial
    - Kunkel, Smith, ISCA'86
    - Hriskesh, Jouppi, Farkas, Burger, Keckler, Shivakumar, ISCA'02
    - Harstein, Puzak, ISCA'02
    - Sprangle, Carmean, ISCA'02
- But those designs are have very high power dissipation
  - Need to optimize for both performance and power/energy

### From System View: What is the Optimum?

- How do sensitivities relate to more traditional metrics:
  - Power per operation (MIPS/W, GOPS/W, TOPS/W)
  - Energy per operation (Joules per op)
  - Energy-delay product
- Can be reformatted as a goal of optimizing power x delay<sup>n</sup>
  - n = 0 minimize power per operation
  - n = 1 minimize energy per operation
  - n = 2 minimize energy-delay product
  - n = 3 minimize energy-(delay)<sup>2</sup> product

### **Optimization Problem**

- Set up optimization problem:
  - Maximize performance under energy constraints
  - Minimize energy under performance constraints
- Or minimize a composite function of E<sup>n</sup>D<sup>m</sup>
  - What are the right n and m?
- $\bullet$  n = 1, m = 1 is EDP improves at lower  $V_{DD}$
- n = 1, m = 2 is invariant to  $V_{DD}$ 
  - $E \sim CV_{DD}^2$
  - D ~ 1/V<sub>DD</sub>

### Hardware Intesnity

- Introduced by Zyuban and Strenski in 2002.
- Measures where is the design on the Energy-Delay curve
- Parameter in cost function optimization

$$F_c = (E/E_0)(D/D_0)^{\eta}$$
  $0 \le \eta < +\infty$ ,

$$\eta = -\left. \frac{D\partial E}{E\partial D} \right|_{V}$$



Slope of the optimal E-D curve at the chosen design point

### **Optimum Across Hierarchy Layers**



Optimal logic depth in pipelined processors is ~18FO4
Relatively flat in the 16-22FO4 range



# 5.D Circuit-Level Tradeoffs

### Alpha-Power Based Delay Model



$$D = \sum_{p_i} t_{p_i} = \sum_{i} \frac{K_d V_{DD}}{(V_{DD} - V_{Th})^{\alpha}} \left( 1 + \frac{W_{L,i}}{W_{in,i}} \right)$$

# **Energy Models**



$$E_{Sw} = \alpha_{0 \to 1} (C_{L,i} + C_{\text{int},i}) V_{DD}^{2}$$



**♦ Leakage** 

$$E_{Lk} = W_{ln}I_0e^{\frac{-(V_{Th}-\gamma V_{DD})}{nV_t}}V_{DD}D$$

### Sizing, Supply, Threshold Optimization

- Transistor sizing can yield large power savings with small delay penalties
  - Gate sizing
  - Beta-ratio adjustments

$$\beta = Wp/Wn$$

- (Stack resizing)
- Supply voltage affects both active and leakage energy
- Threshold voltage affects primarily the leakage

# Apply to Sizing of an Inverter Chain



Unconstrained energy: find min  $D = \Sigma t_{pi}$ 

$$C_{gin,j} = \sqrt{C_{gin,j-1}C_{gin,j+1}}$$
  $W_j = \sqrt{W_{j-1}W_{j+1}}$ 

Constrained energy: find min D, under  $E < E_{max}$ Where  $E = \Sigma e_i$ 

## Constrained Optimization

- Find min(D) subject to  $E = E_{max}$ 
  - Constrained function minimization
- E.g. Lagrange multipliers

Or dual:

$$\Lambda(x) = D(x) + \lambda(E(x) - E_{\text{max}}) \qquad K(x) = E(x) + \lambda(D - D_{\text{max}})$$

$$\frac{\partial \Lambda}{\partial \mathbf{x}} = 0$$

• Can solve analytically for  $x = W_{i'} V_{DD'} V_{Th}$ 

# Inverter Chain: Sizing Optimization



### Inverter Chain: Sizing Optimization



$$W_{j} = \sqrt{\frac{W_{j-1}W_{j+1}}{1 + \lambda W_{j-1}}}$$

[Ma, Franzon, IEEE JSSC, 9/94]

$$\lambda = -\frac{2KV_{DD}^{2}}{\tau_{nom}S_{W}}$$

$$S_W \propto \frac{e_j}{f_j - f_{j-1}}$$

e<sub>i</sub> – energy per stage

 $f_i$  – fanout per stage

Stojanovic, ICCAD'02

- Variable taper achieves minimum energy
- Reduce number of stages at large d<sub>inc</sub>

# Sensitivity to Sizing and Supply

Gate sizing (W<sub>i</sub>)

$$-\frac{\partial E_{sw}}{\partial D} = \frac{e_j}{\tau_{nom} (f_j - f_{j-1})}$$

 $\infty$  for equal  $f_{eff}$   $(D_{min})$ 

• Supply voltage  $(V_{dd})$ 

$$-\frac{\partial E_{sw}}{\partial D} = \frac{E_{sw}}{D} 2 \frac{1 - x_{v}}{\alpha - 1 + x_{v}}$$



$$x_{v} = (V_{Th} + \Delta V_{Th})/V_{dd}$$

# Sensitivity to $V_{th}$

• Threshold voltage  $(V_{th})$ 

$$-\frac{\partial E}{\partial \Delta V_{Th}} = P_{Lk} \left( \frac{V_{DD} - V_{Th} - \Delta V_{Th}}{\alpha n V_t} - 1 \right)$$

### Low initial leakage

⇒ speedup comes for "free"



# Next Lecture

- Low-power design
  - Lowering supply voltage