#### CS 150 Digital Design

## Lecture 24 – Power and Energy

#### 2012-4-17

#### Professor John Wawrzynek today's lecture by John Lazzaro

TAs: Shaoyi Cheng, Daiwei Li, James Parker

#### www-inst.eecs.berkeley.edu/~cs150/



Sad fact: Computers turn electrical energy into heat. Computation is a byproduct.

# **Energy and Performance**

Air or water carries heat away, or chip melts.



UC Regents Spring 2012 © UCB

The Joule: Unit of energy. Can also be expressed as Watt-Seconds. Burning 1 Watt for 100 seconds uses 100 Watt-Seconds of energy.

**1**A

This is how electric tea pots work ...

1 Joule heats 1 gram of water 0.24 degree C

> 1 Joule of Heat Energy per Second

> > -The Watt: Unit of power. The amount of energy burned in the resistor in 1 second.

1 Ohm Resistor

Watt

20 W rating: Maximum power the package is able to transfer to the air. Exceed rating and resistor burns.

CS 150 L24: Power and Energy

## **Cooling an iPod nano ...**



Like resistor on last slide, iPod relies on passive transfer of heat from case to the air.

Why? Users don't want fans in their pocket ...

To stay "cool to the touch" via passive cooling, power budget of 5 W.

If iPod nano used 5W all the time, its battery would last 15 minutes ...

## Powering an iPod nano (2005 edition)



1.2 W-hour battery: Can supply 1.2 watts of power for 1 hour.

1.2 W / 5 W = 15 minutes.

More W-hours require bigger battery and thus bigger "form factor" -it wouldn't be "nano" anymore :-).

Real specs for iPod nano : 14 hours for music, 4 hours for slide shows.

85 mW for music.300 mW for slides.

## Finding the (2005) iPod nano CPU ...

A close relative ...

PP5020 soc

digital media management system-on-chip السبب



Two 80 MHz CPUs. One CPU used for audio, one for slides.

Low-power ARM roughly ImW per MHz ... variable clock, sleep modes.

**85 mW system** power realistic ...

CS 150 L24: Power and Energy

portalplayer<sup>~</sup>

## Year-to-year: continuous improvements

iPod nano 2005 14 hours battery life (audio playback)

What changed ínsíde ?

iPod nano 2006 24 hours battery life (audio playback)

Source: ifixit.com





iPod nano 2005 a C-shaped PC board, with a battery in the "C" opening.

iPod nano 2006 battery lies on top of PC board.

## How? Small IC packages, fewer parts

#### iPod nano 2006 —

#### iPod nano 2005





Source: arstechnica.com

CS 150 L24: Power and Energy



## **Aluminum permits thinner case ...**



# What's happened since 2006?

Source: ilounge.com







#### 2010 Nano



0.74 ounces

#### 2010 Shuffle



0.44 ounces

2010 Nano: "up to" 24 hours audio playback

2010 Shuffle: "up to" 15 hours audio playback





#### 0.39 W Hr (33% of 2005 Nano)

Sources: iFixit, Apple



0.19 W Hr

UC Regents Spring 2012 © UCB



Desired screen size sets smartphone W x L Depth? : Thin body vs. battery life

|         | 2007                             | 2008                             | 2009                             | 2010            | 2011            | Today                            |
|---------|----------------------------------|----------------------------------|----------------------------------|-----------------|-----------------|----------------------------------|
|         | iPhone                           | iPhone 3G                        | iPhone 3GS                       | iPhone 4        | iPhone 4 (CDMA) | iPhone 4S                        |
| Battery | Li-Ion Polymer,<br>3.7V, 1170mAh | Li-Ion Polymer,<br>3.7V, 1150mAh | Li-Ion Polymer,<br>3.7V, 1220mAh | Li-Ion Polymer, | 3.7V, 1420mAh   | Li-lon Polymer,<br>3.7V, 1430mAh |

## 22% gain in battery energy over 5 iterations





iPhone 4{,S} Battery L-shape Main Board

Metal frame acts as antenna

| de:               | 2007                                                                        | 5.0 |
|-------------------|-----------------------------------------------------------------------------|-----|
| w.                | iPhone                                                                      |     |
| Apps              | Samsung                                                                     | ٦   |
| Processor         | S5L8900B01 ARM                                                              |     |
| S                 | Core                                                                        |     |
| Process           | 90nm                                                                        |     |
| Geometry          |                                                                             |     |
| Die Size          | 8.5 x 8.5 mm                                                                |     |
| Pin Count         | 424                                                                         |     |
| ARM Core          | ARM9 (ARMv5)                                                                |     |
| (Instruction Set) |                                                                             |     |
| Clock Speed       | ~600MHz                                                                     |     |
| GPU               | PowerVR MBX                                                                 |     |
| SDRAM             | 1Gb Mobile DDR                                                              |     |
|                   | SISSESSO ARM<br>BSOCB O719<br>ND4BZO2<br>CARE ISSPE - X6C3<br>ECC 45 60 716 |     |

## In 4 years:

2008

6.8x increase in transistor count

2009

2010

33% max clock speed increase

Attached DRAM: 128 MB -> 512 MB

6.8x transistors: Dual CPU and GPU, and to save energy.



CS 150 L24: Power and Energy

## Notebooks ... as designed in 2006 ...

#### 2006 Apple MacBook -- 5.2 lbs



#### 12.8 in

Performance: Must be "close enough" to desktop performance ... most people no longer used a desktop (even in 2006).

**Size and Weight**. Ideal: paper notebook.

# Heat: No longer "laptops" -- top may get "warm", bottom "hot". Quiet fans OK.

CS 150 L24: Power and Energy

## Battery: Set by size and weight limits ...

46x more energy than iPod nano battery. And iPod lets you listen to music for 14 hours!

Almost full 1 inch depth. Width and height set by available space, weight. Battery rating: 55 W-hour.

At 2.3 GHz, Intel Core Duo CPU consumes 31 W running a heavy load - under 2 hours battery life! And, just for CPU!

At 1 GHz, CPU consumes 13 Watts. "Energy saver" option uses this mode ...

55 W-hour battery stores the energy of 1/2 a stick of dynamite.

If battery short-circuits, catastrophe is possible ...



#### MacBook Air ... design the laptop like an iPod



#### 2011 Air: 11.8 in x 7.56 in x 0.68 in; 2.38 lbs



#### 2006 Macbook: 12.8 in x 8.9 in x 1 in; 5.2 lbs

CS 150 L24: Power and Energy

#### Mainboard: fills about 25% of the laptop



35 W-h battery: 63% of 2006 MacBook's 55 W-h



#### 2011 Air: 35 W-h battery, 5 hour battery life\* 2012 iPad: 42.5 W-h battery, 10 hour battery life\*

\*For a content-consumption workload.

![](_page_26_Picture_2.jpeg)

## Battery-Life-Hour/W-h: 1.7x iPad advantage

#### iPad: iPhone++

#### iPad: iPhone++

### A5X: 2 ARM Cortex9 Cores, expanded PowerVR GPU

![](_page_28_Picture_2.jpeg)

🦦 Up to 64 GB Flash

![](_page_28_Picture_4.jpeg)

![](_page_28_Picture_5.jpeg)

![](_page_28_Picture_6.jpeg)

Cellular front-end chips on separate board.

#### MacBook Air: Full PC

Thunderbolt I/O

C

#### Platform Controller CPU/GPU Hub

![](_page_29_Picture_2.jpeg)

.................

![](_page_29_Picture_3.jpeg)

#### to Up 4GB DRAM

![](_page_29_Picture_5.jpeg)

2011 Air: \$999 -- 64 GB SSD, 2 GB RAM, x86 2012 iPad: \$699 -- 64 GB SSD, 1 GB RAM, ARM iPad 2012 CPU: iPhone 4S, with 2X GPU and RAM

![](_page_30_Picture_1.jpeg)

#### "Content Creation vs. Content Consumption"

CS 150 L24: Power and Energy

UC Regents Spring 2012 © UCB

iPad 2011→2012, battery W-hours increased by 70%

Weight increase of 0.11 pound, thicker by 0.03 inches.

Increase needed to double display resolution while keeping battery life @ 10 hours.

![](_page_31_Picture_3.jpeg)

## The CPU is only part of power budget!

2004-era notebook running a full workload.

![](_page_32_Figure_2.jpeg)

## "Amdahl's Law for Power"

If our CPU took no power at all to run, that would only double battery life!

### iPad 2011 $\rightarrow$ 2012 power++ comes from this side.

CS 150 L24: Power and Er Data courtesy Mahesri et al., U of Illinois, 2004 UC Regents Spring 2012 © UCB

## Servers: Total Cost of Ownership (TCO)

![](_page_33_Picture_1.jpeg)

#### Reliability: running computers hot makes them fail more often.

Machine rooms are expensive. Removing heat dictates how many servers to put in a machine room.

**Electric bill adds** up! Powering the servers + powering the air conditioners is a big part of TCO.

2008+2009 laptops

Computations per W-h doubles every 1.6 years, going back to the first computer.

(Jonathan Koomey, Stanford).

![](_page_34_Figure_3.jpeg)

# **Processors and Energy**

![](_page_35_Picture_1.jpeg)
## **Switching Energy: Fundamental Physics**



(3) Fewer circuits. But more transistors can do more work.

(4) Reduce C per node. One reason why we scale processes.

CS 150 L24: Power and Energy

switching

energy?

## Scaling switching energy per gate ...



Due to reducing V and C (length and width of Cs decrease, but plate distance gets smaller).

**Recent slope** more shallow because V is being scaled aggressively.

From: "Facing the Hot Chips Challenge Again", Bill Holt, Intel, presented at Hot Chips 17, 2005. CS 150 L24: Power and Energy UC Regents Spring 2012 © UCB

#### Second Factor: Leakage Currents

Even when a logic gate isn't switching, it burns power.



Isub: Even when this nFet is off, it passes an loff leakage current.

We can engineer any loff we like, but a lower loff also results in a lower lon, and thus a lower maximum clock speed.

Igate: Ideal capacitors have zero DC current. But modern transistor gates are a few atoms thick, and are not ideal.

#### Intel's 2006 processor designs, leakage vs switching power





A lot of work was done to get a ratio this good  $\dots 50/50$ is common.

Bill Holt, Intel, Hot Chips 17 UC Regents Spring 2012 © UCB

### Engineering "On" Current at 25 nm ...



#### Plot on a "Log" Scale to See "Off" Current







CS 150 L24: Power and Energy

**From: Silicon Device Scaling to the Sub-10-nm Regime** Meikei leong,<sup>1\*</sup> Bruce Doris,<sup>2</sup> Jakub Kedzierski,<sup>1</sup> Ken Rim,<sup>1</sup> Min Yang<sup>1</sup>

#### **Customize processes for product types ...**



From: "Facing the Hot Chips Challenge Again", Bill Holt, Intel, presented at Hot Chips 17, 2005. CS 150 L24: Power and Energy UC Regents Spring 2012 © UCB

#### **Transistor physics revisited ...**





Away from the surface, the drain-induced charges remain even when the gate is off!

UC Regents Spring 2012 © UCB

CS 150 L24: Power and Energy

## Solution concept: Fully-depleted channel



We limit the depth of the channel so that the gate voltage "wins" over the drain voltage.

Done as shown, 5 to 7 nm depth for a 20 nm transistor. Requires expensive wafers

"FD-SOI" -- Fully-Depleted Silicon-On-Insulator



Transistor channel is a raised fin. Gate controls channel from sides and top. Channel depth is fin width. 12-15nm for L=22nm.





#### Intel "Ivy Bridge" 22nm CPUs, first production parts



#### 3-D Tri-Gates

Sandy Bridge

32nm planar

1.16B transistors

#### "Less than half the power @ same performance"

Ivy Bridge 22nm FinFet

1.4B transistors



#### Long-term possibility: New devices

#### Electrostatic mechanical relays at the nanoscale



Electromechanical Computing at 500°C with Silicon Carbide. Te-Hao Lee, Swarup Bhunia, Mehran Mehregany CS 150 L24: Power and Energy UC Regents Spring 2012 © UCB

## Working inverter at 500 kHz ... for a while.





- + 10 fA leakage current
- + Works at 500 degrees C
- Fails after 1-10 days of 500 kHz toggles.
  Switching requires 6V V<sub>dd</sub>
- Electromechanical Computing at 500°C with Silicon Carbide. Te-Hao Lee, Swarup Bhunia, Mehran Mehregany CS 150 L24: Power and Energy UC Regents Spring 2012 © UCB

## **Five low-power design techniques**

# **H** Parallelism and pipelining

# **H** Power-down idle transistors

# **K** Slow down non-critical paths

# **K** Clock gating

## **H** Thermal management



Design Technique #1 (of 5)

# **Trading Hardware for Power**

#### via Parallelism and Pipelining ...





CS 150 L24: Power and Energy

## Chandrakasan & Brodersen (UCB, 1992)

| Architecture       | Power<br>(normalized) |  |
|--------------------|-----------------------|--|
| Simple             | 1                     |  |
| Parallel           | 0.36                  |  |
| Pipelined          | 0.39                  |  |
| Pipelined-Parallel | 0.2                   |  |

| Architecture       | Area<br>(normalized) |  |
|--------------------|----------------------|--|
| Simple             | 1                    |  |
| Parallel           | 3.4                  |  |
| Pipelined          | 1.3                  |  |
| Pipelined-Parallel | 3.7                  |  |

| Architecture       | Voltage |  |
|--------------------|---------|--|
| Simple             | 5V      |  |
| Parallel           | 2.9V    |  |
| Pipelined          | 2.9V    |  |
| Pipelined-Parallel | 2.0     |  |











| Pi | pel | lir | iec |
|----|-----|-----|-----|
|    | •   |     |     |

CS 150 L24: Power and Energy

Minimizing Power Consumption in CMOS Circuits

Anantha P. Chandrakasan Robert W. Brodersen **Regents Spring 2012** © UCB

# **Multiple Cores for Low Power**

#### Trade hardware for power, on a large scale ...



# Cell: The PS3 chip













CS 150 L24: Power and Energy

## Cell (PS3 Chip): 1 CPU + 8 "SPUs"



CS 150 L24: Power and Energy

#### **One Synergistic Processing Unit (SPU)**



SPU issues 2 inst/cycle (in order) to 7 execution units 256 KB Local Store, 128 128-bit Registers SPU fills Local Store using DMA to DRAM and network

#### A "Schmoo" plot for a Cell SPU ...



#### Clock speed alone doesn't help E/op ...

But, lowering clock frequency while keeping voltage constant spreads the same amount of work over a longer time, so chip stays cooler ...

 $\mathbf{E}_{0\to 1} = \frac{1}{2} \mathbf{C} \mathbf{V}_{dd}^2 \mathbf{E}_{1\to 0} = \frac{1}{2} \mathbf{C} \mathbf{V}_{dd}^2$ 

49 C 50C 50C 51C 52C 53C 54C 55C 56C 57C 59C 61C 58C 60C 63 C 61C 1.3 4W5W 7W7W 8W 8W 9W 4W 6W 6W 7W 9W 10W 10W 10W 11W 39 C 39 C 40C 41C 42 C 42 C 43C 44C 45C 45C 46C 47C 47C 48C 49C 1.2 3W 2W 3W 4W 4W 4W 5W 5W 5W 5W 6W 6W 7 W 33 C 35C 38C 32 C 33C 35C 36C 36C 37C 37C 38C 39C 39C 1.1 4W 2W2W 3W 3W 3W 3W 4W 4W 4W 4W 28C 29C 29C 30C 30C 30C 31C 31C 31C 32C 28C 1 2W2W 2W 2W2W3W 3W 3W 3W 3W 26C 26C 26C 27C 27C 25C 27C 0.9 1W1W 2W 1W 2W2W N Ņ Ņ Ņ Ņ ω ω ω 4 տ цъ N ò, òo ò ĸ ₽ 00 N Δ o, Freq (GHz)

Vdd (Volt)

#### Scaling V and f does lower energy/op

#### 1 W to get 2.2 GHz performance. 26 C die temp.

# 7W to reliably get 4.4 GHz performance. 47C die temp.

#### If a program that needs a 4.4 Ghz CPU can be recoded to use

two 2.2 Ghz CPUs ... big win.



Vdd (Volt)

#### How iPod nano 2005 puts its 2 cores to use ...



#### Dual ARM Processors

- Dual 32-bit ARM7TDMI processors
- Up to 80 MHz processor operation per core with independent clock-skipping feature on COP
- Efficient cross-bar implementation providing zero wait state access to internal RAM
- Integrated 96KB of SRAM
- 8KB of unified cache per processor
- Six DMA channels

Two 80 MHz CPUs. Was used in several nano generations, with one CPU doing audio decoding, the other doing photos, etc.

CS 150 L24: Power and Energy

Design Technique #2 (of 5)

# **Powering down idle circuits**



#### Add "sleep" transistors to logic ...



## Example: Floating point unit logic.

When running fixed-point instructions, put logic "to sleep".

# +++ When "asleep", leakage power is dramatically reduced.

---- Presence of sleep transistors slows down the clock rate when the logic block is in use.



#### Intel example: Sleeping cache blocks



From: "Facing the Hot Chips Challenge Again", Bill Holt, Intel, presented at Hot Chips 17, 2005. CS 150 L24: Power and Energy UC Regents Spring 2012 © UCB Design Technique #3 (of 5)

# Slow down "slack paths"



#### Fact: Most logic on a chip is "too fast"



From "The circuit and physical design of the POWER4 microprocessor", IBM J Res and Dev, 46:1, Jan 2002, J.D. Warnock et al.



## Use several supply voltages on a chip ...



Why use multi-Vdd? We can reduce dynamic power by using low-power Vdd for logic off the critical path.

What if we can't do a multi-Vdd design? In a multi-Vt process, we can reduce leakage power on the slow logic by using high-Vth transistors.

From: "Facing the Hot Chips Challenge Again", Bill Holt, Intel, presented at Hot Chips 17, 2005. CS 150 L24: Power and Energy UC Regents Spring 2012 © UCB

#### LOW POWER ARM 1136JF-STM DESIGN

George Kuo, Anand Iyer Cadence Design Systems, Inc. San Jose, CA 95134, USA

Logical partition into 0.8V and 1.0V nets done manually to meet 350 MHz spec (90nm).

Level-shifter insertion and placement done automatically.

Dynamic power in 0.8V section cut 50% below baseline.

Leakage power in 1.0V section cut 70% below baseline.



From a chapter from new book on ASIC design by Chinnery and Keutzer (UCB).

Design Technique #4 (of 5)

# Gating clocks to save power



## On a CPU, where does the power go?



#### So (gasp) gated clocks are a big win. But, done with CAD tools in a disciplined way.



From: Bose, Martonosi, Brooks: Sigmetrics-2001 Tutorial UC Regents Spring 2012 © UCB

## Synopsis Power Compiler can do this ...



"Up to 70% power savings at the block level, for applicable circuits" Synopsis Data Sheet


Design Technique #5 (of 5)

# **Thermal Management**



UC Regents Spring 2012 © UCB

# Keep chip cool to minimize leakage power



Figure 3: I<sub>CCINTQ</sub> vs. Junction Temperature with Increase Relative to 25°C

Optimizing Designs for Power Consumption through Changes to the FPGA Environment

**XILINX**®

## **IBM Power 4: How does die heat up?**



4 dies on a multi-chip module

> 2 CPUs \_\_\_\_\_ per die





## 115 Watts: Concentrated in "hot spots"



CS 150 L24: Power and Energy

66.8 C == 152 F

UC Regents Spring 2012 © UCB

82 C == 179.6

#### Idea: Monitor temperature, servo clock speed



CS 150 L24: Power and Energy

TDP = Thermal Design Point

UC Regents Spring 2012 © UCB

### **Thursday's lecture: Graphics ...**



CS 150 L24: Power and Energy

UC Regents Spring 2012 © UCB