

UC Berkeley EECS
Lecturer SOE
Dan Garcia

# CS10 The Beauty and Joy of Computing

**Lecture #8: Concurrency** 

2010-09-29

#### AT A RECENT FIRESIDE CHAT...

Nvidia CEO Jen-Hsun Huang said: "whatever capability you think you have today, it's nothing compared to what you're going to have in a couple of years ... due to supercomputers in the cloud". Now, cloud computing does what you could do on your PC. Imagine 40,000 times that!



www.theregister.co.uk/2010/09/24/huang\_muses\_at\_gtc/

## Concurrency & Parallelism, 10 mi up...

#### Intra-computer

- Today's lecture
- Multiple computing "helpers" are cores within one machine
- Aka "multi-core"
  - Although GPU parallism is also "intra-computer"

#### Inter-computer

- Week 12's lectures
- Multiple computing "helpers" are <u>different</u> <u>machines</u>
- Aka "distributed computing"
  - Grid & cluster computing











#### **Anatomy: 5 components of any Computer**



John von Neumann invented this architecture





Control ("brain")

Datapath ("brawn")

Memory

Devices
Input
Output

- a) Control
- b) Datapath
- c) Memory
- d) Input
- e) Output

What causes the most headaches for SW and HW designers with multi-core computing?

@ <u>0</u> 9 9

Garcia, Fall 2010

UC Berkeley CS10 "The Beauty and Joy of Computing": Concurrency (4)

#### **But what is INSIDE a Processor?**



**Bare Processor Die** 



Chip in Package

- Primarily Crystalline Silicon
- 1 mm 25 mm on a side
- 2009 "feature size" (aka process)
   45 nm = 45 x 10<sup>-9</sup> m
   (then 32, 22, and 16 [by yr 2013])
- 100 1000M transistors
- 3 10 conductive layers
- "CMOS" (complementary metal oxide semiconductor) - most common
- Package provides:
  - spreading of chip-level signal paths to board-level
  - heat dissipation.
- Ceramic or plastic with gold wires.



#### en.wikipedia.org/wiki/Moore's\_law

#### Moore's Law

#### Predicts: 2X Transistors / chip every 2 years





Gordon Moore Intel Cofounder B.S. Cal 1950!





#### Moore's Law and related curves





Garcia, Fall 2010



Data partially collected by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond

#### Moore's Law and related curves





Data partially collected by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond



## Power Density Prediction circa 2000





Source: S. Borkar (Intel)



## Going Multi-core Helps Energy Efficiency

- Power of typical integrated circuit  $\sim$  C  $V^2 f$ 
  - C = Capacitance, how well it "stores" a charge
  - V = Voltage
  - = f = frequency. I.e., how fast clock is (e.g., 3 GHz)





Activity Monitor
(on the lab Macs)
shows how active
your cores are



William Holt, HOT Chips 2005



## **Energy & Power Considerations**



Power = 
$$\frac{\text{Energy}}{\text{Second}}$$
 =  $\frac{\text{Energy}}{\text{Op}} \times \frac{\text{Ops}}{\text{Second}}$ 

#### **Power**

Chip Packaging Chip Cooling System Noise Case Temperature Data-Center Air Conditioning

#### **Energy**

**Battery Life Electricity Bill** Mobile Device Weight

# Energy per Operation









#### Parallelism again? What's different this time?

"This shift toward increasing parallelism is not a triumphant stride forward based on breakthroughs in novel software and architectures for parallelism; instead, this plunge into parallelism is actually a retreat from even greater challenges that thwart efficient silicon implementation of traditional uniprocessor architectures."

- Berkeley View, December 2006
- HW/SW Industry bet its future that breakthroughs will appear before it's too late

view.eecs.berkeley.edu





## **Background: Threads**

- A Thread stands for "thread of execution", is a single stream of instructions
  - A program / process can split, or fork itself into separate threads, which can (in theory) execute simultaneously.
  - An easy way to describe/think about parallelism
- A single CPU can execute many threads by *Time Division Multipexing*



 Multithreading is running multiple threads through the same hardware





Process

#### en.wikipedia.org/wiki/Amdahl's\_law

## Speedup Issues: Amdahl's Law

Applications can almost <u>never</u> be completely parallelized; some serial code remains



- s is serial fraction of program, P is # of cores (was processors)
- · Amdahl's law:

Speedup(P) = Time(1) / Time(P)  

$$\leq 1 / (s + [(1-s) / P)]$$
, and as P  $\rightarrow \infty$   
 $\leq 1 / s$ 

Even if the parallel portion of your application speeds up perfectly, your performance may be limited by the sequential portion



## Speedup Issues: Overhead

- Even assuming no sequential portion, there's...
  - Time to think how to divide the problem up
  - Time to hand out small "work units" to workers
  - All workers may not work equally fast

- Some workers may fail
- There may be contention for shared resources
- Workers could overwriting each others' answers
- You may have to wait until the last worker returns to proceed (the slowest / weakest link problem)
- There's time to put the data back together in a way that looks as if it were done by one





#### Life in a multi-core world...

 This "sea change" to multicore parallelism means that the computing community has to rethink:

a) Languages

b) Architectures

c) Algorithms

d) Data Structures

e) All of the above







## en.wikipedia.org/wiki/Concurrent\_computing But Concurrent programming is hard!

- What if two people were calling withdraw at the same time?
  - E.g., balance=100 and
     two withdraw 75 each
  - Can anyone see what the problem *could* be?
  - This is a race condition
- In most languages, this is a problem.
  - In Scratch, the system doesn't let two of these run at once.

```
withdraw amount

if balance > amount

set balance * to balance - amount

report true

report false
```





## Summary

- "Sea change" of computing because of inability to cool CPUs means we're now in multi-core world
- This brave new world offers lots of potential for innovation by computing professionals, but also challenges





