inst.eecs.berkeley.edu/~eecs251b

# EECS251B : Advanced Digital Circuits and Systems

# **Lecture 17 – Variability**

#### **Borivoje Nikolić**

#### **Cerebras' Third-Gen Wafer-Scale Chip Doubles Performance**

March 13, 2024, Sally Ward-Foxton, EETimes. Cerebras has unveiled a third generation of its wafer-scale chip, offering 125 PFLOPS (at FP16 precision) from a single device. Given a single day, a four-chip installation could fine-tune Llama2-70B, while the biggest installations of 2,048 chips would be able to train it from scratch in the same time.

The wafer-scale engine 3 (WSE3) doubles the large language model (LLM) training speed of the WSE2, in the same 15kW power envelope and at the same cost point, Cerebras CEO Andrew Feldman told EE Times.

... The WSE also features 42 GB of SRAM with 21 PBytes/s memory bandwidth.



Cerebras' third-gen wafer-scale engine. (Source: Cerebras)



### Announcements

- Project
  - Midterm reports due next week
  - Preliminary design review after Spring break
- Homework 3 due next week
  - Quiz 3 after Spring break



Design Variability Sources and Impact on Design

3

## Systematic and Random Device Variations

| Parameter                                     | Random                                                                                | Systematic                                                                                       |
|-----------------------------------------------|---------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------|
| Channel Dopant<br>Concentration Nch           | Affects 6 <sub>VT</sub> <sup>[1]</sup>                                                | Non uniformity in the process of dopant implantation, dosage, diffusion                          |
| Gate Oxide Thickness<br>Tox                   | Si/SiO <sub>2</sub> & SiO <sub>2</sub> /Poly-Si<br>interface roughness <sup>[2]</sup> | Non uniformity in the process of oxide growth                                                    |
| Threshold Voltage $V_T$ (non Nch related)     | Random anneal temperature and strain effects                                          | Non-uniform annealing temperature <sup>[5]</sup><br>(metal coverage over gate)<br>Biaxial strain |
| Mobility µ                                    | Random strain distributions                                                           | Systematic variation of strain in the Si due to STI, S/D area, contacts, gate density, etc       |
| Gate Length L                                 | Line edge roughness (LER) <sup>[3]</sup>                                              | Lithography and etching:<br>Proximity effects, orientation <sup>[4]</sup>                        |
| Fin geometry/<br>film thickness<br>variations | Rounding, etc, ຣ <sub>vT</sub> , mobility.                                            | Systematic fin thickness<br>Systematic Si film/BOX variations                                    |

[1] D. Frank et al, VLSI Symposium, Jun. 1999.
[2] A. Asenov et al, IEEE Trans on Electron Devices, Jan. 2002.
[3] P. Oldiges et al, SISPAD 2000, Sept. 2000.
[4] M. Orshansky et al, IEEE Trans on CAD, May 2002.
[5] Tuinhout et al, IEDM, Dec 1996

# Dealing with Systematic Variations

Model-to-hardware correlation classifies unknown sources



Extraction/Compact modeling/Design techniques

## Chip Yield Depends on Inter-Gate Correlation



ρ = 0 gives highest yield through averaging

Non-correlated gates in a path reduce impact of variation

Bowman et al, JSSC, Feb 2002.

6

# Chip Yield Depends on Inter-Path Correlation



- Yield = Pr (max delay of K paths < clock period)
- K = 1 gives highest yield

**Correlated paths reduce impact of variation** 

Bowman et al, JSSC, Feb 2002.

7



# Design Variability Some Systematic Effects

## Layout: Poly Proximity Effects

• Gate CD is a function of its neighborhood



#### Gate length depends on

- Light intensity profile falling on the resist
- Resist: application of developer fluid<sup>[1]</sup>, post exposure bake (PEB) temperature<sup>[2]</sup>
- Dry etching: microscopic loading effects<sup>[3]</sup>

[1] J.Cain, M.S. Thesis, UC Berkeley

[2] D. Steele et al, SPIE, vol.4689, July 2002.

[3] J. D. Plummer, M.D. Deal, P.B. Griffin, Silicon VLSI Technology, Prentice-Hall, 2000.

# Layout: Proximity Test Structures

• 90nm experiments

![](_page_9_Picture_2.jpeg)

• 45nm experiments

No single gates allowed

![](_page_9_Figure_5.jpeg)

![](_page_9_Picture_6.jpeg)

#### L.T. Pang, CICC'08

• Ring oscillators and individual transistor leakage currents

## Results: Single Gates in 90nm

![](_page_10_Figure_1.jpeg)

- Max  $\Delta$ F between layouts > 10%
- Within-die  $3\sigma/\mu \sim 3.5\%$ , weak dependency on density

# Results: Single Gates in 45nm

![](_page_11_Figure_1.jpeg)

- Weak effect on performance.  $\Delta F \sim 2\%$
- Small shifts in NMOS leakage and bigger shifts in PMOS leakage

### Impact of Longer Diffusion in 45nm

![](_page_12_Figure_1.jpeg)

- Strongest effect measured in 45nm,  $\Delta F \sim 5\%$
- No significant shift in I<sub>LEAK</sub>

Impact of Shallow Trench Isolation (STI)

![](_page_13_Figure_1.jpeg)

- $\Delta F \sim 3\%$ , small changes in I<sub>LEAK</sub>
- Due to STI-induced stress

### Patterning and process impact on FinFETs

![](_page_14_Figure_1.jpeg)

 $\boldsymbol{\mathcal{A}}$ 

![](_page_15_Picture_0.jpeg)

# Design Variability Some Random Effects

### **Random Dopant Fluctuations**

• Number of dopants is finite

![](_page_16_Figure_2.jpeg)

![](_page_16_Figure_3.jpeg)

Frank, IBM J R&D 2002 EECS251B L17 VARIABILITY

# Processing: Line-Edge Roughness

![](_page_17_Picture_1.jpeg)

- •Sources of line-edge roughness:
- Fluctuations in the total dose due to quantization
- Resist composition
- Absorption positions Effect:
- Variation (random) in leakage and power

# **Transistor Matching**

• V<sub>Th</sub> matching of geometrically identical transistors varies with size  $\sim \sqrt{WL}$  and distance

![](_page_18_Figure_2.jpeg)

# FDSOI example

![](_page_19_Figure_1.jpeg)

• All the effects follow 1/sqrt(WL) dependence

EECS251B L17 VARIABILITY

Short channel effect: 
$$\sigma_{L}$$
,  $\sigma_{TSi}$   
 $\sigma_{Ir_{t},SCE} \propto \sqrt{\left(\frac{\partial V_{t}}{\partial L}\sigma_{L}\right)^{2} + \left(\frac{\partial V_{t}}{\partial T_{Si}}\sigma_{TSi}\right)^{2}}$  (1)  
 $V_{t}$  long channel [18] :  
 $V_{t} = \Delta \phi_{mi} + \frac{k.T}{q} \ln\left(\frac{2.C_{ox}.k.T}{q^{2}.n_{t}.T_{Si}}\right) + \frac{\hbar^{2}.\pi^{2}}{2.q.m^{*}.T_{Si}^{2}}$  (2)  
with  $\Delta \phi_{mi}$  the gate WF with respect to intrinsic Si.  
Oxide charges:  
 $\sigma_{Vt,Qox} \propto \frac{q.T_{ox}}{\varepsilon_{ox}} \frac{\sqrt{N_{it} + N_{ox}}}{\sqrt{W.L}}$  (3)  
Oxide thickness and permittivity [19]:  $\sigma_{Tox}$ ,  $\sigma_{cox}$   
 $\sigma_{Vt,Tox} \propto \frac{k.T}{q} \frac{\alpha}{\sqrt{W.L}} \sqrt{\left(\frac{\sigma_{\varepsilon_{ox}}}{\varepsilon_{ox}}\right)^{2} + \left(\frac{\sigma_{Tox}}{T_{ox}}\right)^{2}}$  (4)  
T\_{Si} thickness:  $\sigma_{TSi}$   
 $\sigma_{Vt,TSi} \propto \frac{k.T}{q} \frac{\beta}{\sqrt{W.L}} \frac{\sigma_{TSi}}{T_{Si}}$  (5)  
Metal gate workfunction :  $\sigma_{\Phi m}$   
 $\sigma_{Vt,\phi m} \propto \frac{\gamma}{\sqrt{W.L}} \sigma_{\phi m}$  (6)  
with  $\alpha, \beta, \gamma$  spatial correlation lengths of the fluctuations

20

# Negative Bias Temperature Instability

- PFET  $V_{Th}$ 's shift in time, at high negative bias and elevated temperatures
- The mechanism is thought to be the breaking of hydrogen-silicon bonds at the Si/SiO2 interface, creating surface traps and injecting positive hydrogen-related species into the oxide.
- Also other charge trapping and hot-carrier defect generation
- Systematic + random shifts

![](_page_20_Figure_5.jpeg)

# Random Telegraph Signal (RTS)

![](_page_21_Figure_1.jpeg)

• Trapping of a carrier in oxide traps modulates  $V_{th}$  or  $I_{ds}$ 

•  $\tau_e$  and  $\tau_c$  are random and follow exponential distributions

N. Tega et al, IRPS 2008.

# RTS and Technology Scaling

• RTS exceeds RDF at 3 sigma with 20nm gates

$$\Delta V_{\text{th, RTS}} \sim \frac{1}{WL}$$

$$\sum_{\substack{\mathbf{0} \\ \mathbf{0} \\$$

![](_page_23_Picture_0.jpeg)

# Memory

# Random Access Memory Architecture

- Conceptual: Linear array of addresses
  - Each box holds some data
  - Not practical to physically realize
    - millions of 32b/64b words

- Create a 2-D array
  - Decode Row and Column address to get data

![](_page_24_Figure_7.jpeg)

![](_page_24_Figure_8.jpeg)

# ECS251B L17 VARIABILITY

# Basic Memory Array (From 151/251A)

- Core
  - Wordlines to access rows
  - Bitlines to access columns
  - Data multiplexed onto columns
- Decoders
  - Addresses are binary
  - Row/column MUXes are
     'one-hot' only one is active at a time
- Important to optimize the aspect ratio to balance the delays

![](_page_25_Figure_10.jpeg)

O 26

# Memory Banks

- Traditionally addressed by the LSB
  - Example two-bank memory
  - Odd and even banks

![](_page_26_Figure_4.jpeg)

![](_page_26_Figure_5.jpeg)

EECS251B L17 VARIABILITY

28

![](_page_27_Figure_2.jpeg)

![](_page_27_Figure_3.jpeg)

# SRAM Scaling or Not?

TSMC at IEDM'19

![](_page_28_Figure_2.jpeg)

![](_page_28_Figure_3.jpeg)

![](_page_28_Figure_4.jpeg)

#### Bora's spreadsheet/WikiChip

![](_page_28_Figure_6.jpeg)

Technology Node (nm)

![](_page_28_Figure_8.jpeg)

# Summary

- Variability: Systematic and random
- Random, uncorrelated variations average out
- Identified random and systematic sources of variability

![](_page_30_Picture_0.jpeg)