# **EE241B: Advanced Digital Circuits**

# **Lecture 18 – Power-Performance Tradeoffs 2**

### **Borivoje Nikolić**

MarketWatch, March 28: Opinion: There's no returning to regular schooling as online learning goes mainstream, by Alex Hicks



When in-person education resumes, online learning tools and

### Announcements

- Project midterm reports due today, March 31
  - Please e-mail me the link to your web page
- Assignment 3 due Thursday, April 2.
  - Quiz next Tuesday
- Reading req'd
  - Rabaey et al, LPDE, Ch. 4

### Outline

- Module 5
  - Power-performance tradeoffs



5.B Power-Performance Tradeoffs

### Know Your Enemy

- Where does power go in CMOS?
- - Charging capacitors
- Short-circuit power
- - Biasing currents

- Switching (dynamic) power
- Leakage power
  - Transistors are imperfect switches
- - Both pull-up and pull-down on during transition
- Static currents

### Summary of Power Dissipation Sources

# $P \sim \alpha \cdot (C_L + C_{CS}) \cdot V_{swing} \cdot V_{DD} \cdot f + (I_{DC} + I_{Leak}) \cdot V_{DD}$

- α switching activity
- C<sub>L</sub> load capacitance
- I<sub>DC</sub> static current C<sub>CS</sub> – short-circuit "capacitance" I<sub>leak</sub> – leakage current
- V<sub>swing</sub> voltage swing
- f frequency

$$P = \frac{energy}{operation} \times rate + static power$$

### **CMOS Performance Optimization**

• Reminder - sizing: Optimal performance with equal fanout per stage



- Extendable to general logic cone through 'logical effort'
- Equal effective fanouts  $(g_iC_{i+1}/C_i)$  per stage
- Optimal fanout is around 4





# Performance Optimization Energy Mircoarchitecture A Mircoarchitecture B















Global optimum – best performance

Minimize energy for given throughput

### Power-Performance Optimization

- There are many sets of parameters to adjust
  - Tuning variables
  - Circuit (sizing, supply, threshold)
  - Logic style (std. cells, custom , ...)
  - Block topology (adder: CLA, CSA, ...)
  - Micro-architecture (parallel, pipelined)



### Power-Performance Optimization

- There are many sets of parameters to adjust
  - Tuning variables
  - Circuit (sizing, supply, threshold)
  - Logic style (std. cells, custom , ...)
  - Block topology (adder: CLA, CSA, ...)
  - Micro-architecture (parallel, pipelined)



Globally optimal power-performance curve for a given function

### **Energy-Delay Sensitivity**



### Solution: Equal Sensitivities



At the solution point all sensitivities should be equal

# ${\it 5. C Architectural Optimization}\\$

### **Optimal Processors**

- Processors used to be optimized for performance
  - Optimal logic depth was found to be 8-11 FO4 delays in superscalar processors
  - 1.8-3 FO4 in sequentials, rest in combinatorial
    - Kunkel, Smith, ISCA'86
    - Hriskesh, Jouppi, Farkas, Burger, Keckler, Shivakumar, ISCA'02
    - Harstein, Puzak, ISCA'02
    - Sprangle, Carmean, ISCA'02
- But those designs are have very high power dissipation
  - Need to optimize for both performance and power/energy

### From System View: What is the Optimum?

- How do sensitivities relate to more traditional metrics:
  - Power per operation (MIPS/W, GOPS/W, TOPS/W)
  - Energy per operation (Joules per op)
  - Energy-delay product
- Can be reformatted as a goal of optimizing power x delay<sup>n</sup>
  - n = 0 minimize power per operation
  - n = 1 minimize energy per operation
  - n = 2 minimize energy-delay product
  - $n = 3 minimize energy-(delay)^2 product$

### Optimization Problem

- Set up optimization problem:
  - Maximize performance under energy constraints
  - Minimize energy under performance constraints
- ${}^{\bullet}$  Or minimize a composite function of  $E^nD^m$ 
  - What are the right n and m?
- n = 1, m = 1 is EDP improves at lower  $V_{DD}$
- $\bullet$  n = 1, m = 2 is invariant to  $V_{DD}$ 
  - E  $\sim \text{CV}_{\text{DD}}^2$
  - D ~ 1/V<sub>DD</sub>

### Hardware Intesnity

- Introduced by Zyuban and Strenski in 2002.
- Measures where is the design on the Energy-Delay curve
- Parameter in cost function optimization







Slope of the optimal E-D curve at the chosen design point

### **Optimum Across Hierarchy Layers**



Optimal logic depth in pipelined processors is ~18FO4 Relatively flat in the 16-22FO4 range

Zyuban et al, TComp'04

### 5.D Circuit-Level Tradeoffs

### Alpha-Power Based Delay Model



$$D = \sum t_{pi} = \sum \frac{K_d V_{DD}}{(V_{DD} - V_{Th})^{\alpha}} \left(1 + \frac{W_{L,i}}{W_{in,i}}\right)$$

### **Energy Models**

$$E_{Sw} = \alpha_{0 \to 1} (C_{L,i} + C_{\text{int},i}) V_{DD}^{2}$$



◆ Leakage

$$E_{Lk} = W_{ln}I_0e^{\frac{-(V_{Th} - \gamma V_{DD})}{nV_t}}V_{DD}D$$

## Sizing, Supply, Threshold Optimization

- Transistor sizing can yield large power savings with small delay penalties

  - Beta-ratio adjustments

$$\beta = Wp/Wn$$

- (Stack resizing)
- Supply voltage affects both active and leakage energy
- Threshold voltage affects primarily the leakage

### Apply to Sizing of an Inverter Chain



Unconstrained energy: find min  $D = \Sigma t_n$ 

$$C_{ain,i} = \sqrt{C_{ain,i-1}C_{ain,i+1}}$$

$$W_i = \sqrt{W_{i,j}W_{i,j}}$$

Constrained energy: find min D, under  $E < E_{max}$ Where  $E = \Sigma e_i$ 

### Constrained Optimization

- Find min(D) subject to  $E = E_{max}$ 
  - Constrained function minimization
- E.g. Lagrange multipliers

$$\Lambda(x) = D(x) + \lambda(E(x) - E_{max}) \qquad K(x) = E(x) + \lambda(D - D_{max})$$

$$\frac{\partial \Lambda}{\partial x} = 0$$

• Can solve analytically for  $x = W_{j}$ ,  $V_{DD}$ ,  $V_{Th}$ 





### Inverter Chain: Sizing Optimization



# Sensitivity to Sizing and Supply

Gate sizing (W<sub>i</sub>)

$$-\frac{\partial E_{sw}}{\partial D} / \frac{\partial W_{j}}{\partial W_{j}} = \frac{e_{j}}{\tau_{nom} (f_{j} - f_{j-1})}$$

• Supply voltage (V<sub>dd</sub>)

$$-\frac{\partial E_{sw}}{\partial D} / \frac{\partial V_{DD}}{\partial V_{DD}} = \frac{E_{sw}}{D} 2 \frac{1 - x_v}{\alpha - 1 + x_v}$$

 $x_v = (V_{Th} + \Delta V_{Th})/V_{dd}$ 





### Inverter Chain: Sizing Optimization



$$W_j = \sqrt{\frac{W_{j-1}W_{j+1}}{1 + \lambda W_{j-1}}}$$

$$\lambda = -\frac{2KV_{DD}^{2}}{\tau_{nom}S_{W}}$$

$$S_{W} \propto \frac{e_{j}}{f_{i} - f_{i-1}}$$

- Variable taper achieves minimum energy
- Reduce number of stages at large d<sub>inc</sub>

# Sensitivity to V<sub>th</sub>

Threshold voltage  $(V_{th})$ 

$$-\frac{\frac{\partial E}{\partial \Delta V_{Th}}}{\frac{\partial D}{\partial \Delta V_{th}}} = P_{Lk} \left( \frac{V_{DD} - V_{Th} - \Delta V_{Th}}{\alpha n V_t} - 1 \right)$$

Low initial leakage

⇒ speedup comes for "free"



### Next Lecture

- Low-power design
  - Lowering supply voltage



