#### CS 150 Digital Design

#### Lecture 12 – DRAM

#### 2012-2-23

#### Professor John Wawrzynek today's lecture by John Lazzaro

TAs: Shaoyi Cheng, Daiwei Li, James Parker





### **Today's Lecture: DRAM**

## **H** DRAM, Xilinx, and You



## **H** DRAM: Bottom-up

**DRAM: Top-down** 







₩

#### **DDR2 SO-DIMM on ML505 Board**

#### DDR2: Double-Data Rate, 2nd generation



#### SO-DIMM: Small-Outline, Dual Inline Memory Module

CS 150 L12: DRAM



#### **DDR2 SO-DIMM Module**



CS 150 L12: DRAM

#### **Project controller: Xilinx-supplied IP**



## **H** DRAM, Xilinx, and You

To understand the **H** DRAM: Bottom-up DRAM controller, you need to understand how a DRAM chip **DRAM:** Top-down works. Otherwise, it just seems like magic.

![](_page_5_Picture_3.jpeg)

![](_page_6_Picture_0.jpeg)

![](_page_6_Picture_1.jpeg)

### **Recall: Building a capacitor**

![](_page_7_Figure_1.jpeg)

#### Conducts electricity well. (metal, doped polysilicon)

An insulator. Does not conducts electricity at all. (air, glass (silicon dioxide))

#### Conducts electricity well (metal, doped polysilicon)

![](_page_7_Picture_5.jpeg)

#### **Recall: Capacitors in action**

Because the dielectric is an insulator, and does not conduct.

![](_page_8_Picture_2.jpeg)

#### After circuit "settles" ...

Q = C V = C \* 1.5 Volts (D cell)

**Q:** Charge stored on capacitor

C: The capacitance of the device: function of device shape and type of dielectric.

![](_page_8_Picture_7.jpeg)

![](_page_8_Picture_8.jpeg)

#### **Storing computational state as charge**

State is coded as the amount of energy stored by a device.

![](_page_9_Picture_2.jpeg)

# +++ +++ ---- ----

#### State is read by sensing the amount of energy

Problems: noise changes Q (up or down), parasitics leak or source Q. Fortunately, Q cannot change instantaneously, but that only CS 150 L12: DRAM gets us in the ballpark.

### **MOS Transistors**

# Two diodes and a capacitor in an interesting arrangement. So, we begin with a diode review ...

![](_page_10_Picture_2.jpeg)

#### **Diodes in action ...**

CS 150 L12: DRAM

![](_page_11_Picture_1.jpeg)

![](_page_11_Picture_2.jpeg)

#### **Diodes: Current vs Voltage**

![](_page_12_Figure_1.jpeg)

#### How to make a silicon diode ...

![](_page_13_Figure_1.jpeg)

#### **Note: IC Diodes are biased "off"!**

![](_page_14_Figure_1.jpeg)

### V1, V2 > 0V. Piodes "off", only current is lo "leakage". I = Io [exp(V/Vo) - 1]Anodes of all diodes on wafer connected to ground.

![](_page_14_Picture_3.jpeg)

### **MOS Transistors**

## Two diodes and a capacitor in an interesting arrangement ...

![](_page_15_Picture_2.jpeg)

#### What we want: the perfect switch.

![](_page_16_Figure_1.jpeg)

CS 150 L12: DRAM

We want to turn a p-type region into an n-type region under voltage control.

We need electrons to fill valence holes and add conduction band electrons

![](_page_16_Figure_4.jpeg)

### An n-channel MOS transistor (nFET)

![](_page_17_Figure_1.jpeg)

![](_page_17_Figure_2.jpeg)

Vg = 1V, small region near the surface turns from p-type to n-type.

nFet is on

#### Mask set for an n-Fet (circa 1986)

![](_page_18_Figure_1.jpeg)

Masks #1: n+ diffusion #2: poly (gate) #3: diff contact #4: metal

Layers to do p-Fet not shown. Modern processes have 6 to 10 metal layers (or more) (in 1986: 2).

### **Dynamic Memory Cells**

![](_page_19_Picture_1.jpeg)

#### **Recall: Capacitors in action**

Because the dielectric is an insulator, and does not conduct.

![](_page_20_Picture_2.jpeg)

After circuit "settles" ...

Q = C V = C \* 1.5 Volts (D cell)

**Q: Charge stored on capacitor** 

C: The capacitance of the device: function of device shape and type of dielectric.

![](_page_20_Picture_7.jpeg)

![](_page_20_Picture_8.jpeg)

#### **DRAM cell: 1 transistor, 1 capacitor**

![](_page_21_Figure_1.jpeg)

### A 4 x 4 DRAM array (16 bits) ....

![](_page_22_Figure_1.jpeg)

CS 150 L12: DRAM

#### **Invented after SRAM, by Robert Dennard**

#### United States Patent Office

**3,387,286** Patented June 4, 1968

![](_page_23_Figure_3.jpeg)

#### 2

tinent in disclosing various concepts and structures which have been developed in the application of field-effect transistors to different types of memory applications, the primary thrust up to this time in conventional read-write random access memories has been to connect a plurality of field-effect transistors in each cell in a latch configuration. Memories of this type require a large number of active devices in each cell and therefore each cell re-

![](_page_23_Picture_6.jpeg)

![](_page_23_Figure_7.jpeg)

CS 150 L12: DRAM

#### **DRAM Circuit Challenge #1: Writing**

![](_page_24_Figure_1.jpeg)

Vgs = Vdd - Vc. When Vdd - Vc == Vth, charging effectively stops!

#### **DRAM Challenge #2: Destructive Reads**

![](_page_25_Figure_1.jpeg)

![](_page_25_Picture_2.jpeg)

#### **DRAM Circuit Challenge #3a: Sensing**

CS 150 L12: DRAM

Assume Ccell = 1 fFBit line may have 2000 nFet drains, assume bit line C of 100 fF or 100\*Ccell. Ccell holds Q = Ccell\*(Vdd-Vth) 100\*Ccell Ccell When we dump this charge onto the bit line, what voltage do we see? dV = [Ccell\*(Vdd-Vth)] / [100\*Ccell]dV = (Vdd-Vth) / 100 = tens of millivolts! In practice, scale array to get a 60mV signal.

### **DRAM Circuit Challenge #3b: Sensing**

![](_page_27_Figure_1.jpeg)

CS 150 L12: DRAM

#### **DRAM Challenge #4: Leakage ...**

![](_page_28_Figure_1.jpeg)

CS 150 L12: DRAM

#### **DRAM Challenge #5: Cosmic Rays ...**

![](_page_29_Figure_1.jpeg)

CS 150 L12: DRAM

#### **DRAM Challenge 6: Yield**

![](_page_30_Figure_1.jpeg)

Solution: add extra bit lines (i.e. 80 when you only need 64). During testing, find the bad bit lines, and use high current to burn away "fuses" put on chip to remove them.

![](_page_30_Figure_3.jpeg)

CS 150 L12: DRAM

Extra bit lines.

#### **Moore's Law for CPUs and DRAMs**

![](_page_31_Figure_1.jpeg)

From: "Facing the Hot Chips Challenge Again", Bill Holt, Intel, presented at Hot Chips 17, 2005.

CS 150 L12: DRAM

#### Main driver: device scaling ...

![](_page_32_Figure_1.jpeg)

![](_page_32_Figure_2.jpeg)

From: "Facing the Hot Chips Challenge Again", Bill Holt, Intel, presented at Hot Chips 17, 2005.

CS 150 L12: DRAM

### **Process Scaling: Why chips don't fry**

![](_page_33_Figure_1.jpeg)

IC process scaling ("Moore's Law")

Pue to reducing V and C (length and width of Cs decrease, but plate distance gets smaller).

Recent slope more shallow because V is being scaled less aggressively.

From: "Facing the Hot Chips Challenge Again", Bill Holt, Intel, presented at Hot Chips 17, 2005. CS 150 L12: DRAM

### **DRAM Challenge 7: Scaling**

Each generation of IC technology, we shrink width and length of cell. If Ccell and drain canacitances scale t

If Ccell and drain capacitances scale together, number of bits per bit line stays constant.

dV = 60 mV= [Ccell\*(Vdd-Vth)] / [100\*Ccell]

Problem 1: Number of arrays per chip grows!
Problem 2: Vdd may need to scale down too!

**Solution: Constant Innovation of Cell Capacitors!** UC Regents S

#### **Poly-diffusion Ccell is ancient history**

![](_page_35_Figure_1.jpeg)

![](_page_35_Picture_2.jpeg)

### Early replacement: "Trench" capacitors

![](_page_36_Picture_1.jpeg)

SEM photomicrograph of 0.25- $\mu$ m trench DRAM cell suitable for scaling to 0.15 $\mu$ m and below. Reprinted with permission from [17]; © 1995 IEEE.

![](_page_36_Picture_3.jpeg)

#### **Final generation of trench capacitors**

![](_page_37_Picture_1.jpeg)

![](_page_37_Picture_2.jpeg)

![](_page_37_Picture_3.jpeg)

The companies that kept scaling trench capacitors for commodity **DRAM** chips went out of business.

#### Modern cells: "stacked" capacitors

![](_page_38_Figure_1.jpeg)

CS 150 L12: DRAM CS

#### In the labs: Vertical cell transistors ...

![](_page_39_Figure_1.jpeg)

880

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 45, NO. 4, APRIL 2010

## A 31 ns Random Cycle VCAT-Based $4F^2$ DRAM With Manufacturability and Enhanced Cell Efficiency

Ki-Whan Song, Jin-Young Kim, Jae-Man Yoon, Sua Kim, Huijung Kim, Hyun-Woo Chung, Hyungi Kim, Kanguk Kim, Hwan-Wook Park, Hyun Chul Kang, Nam-Kyun Tak, Dukha Park, Woo-Seop Kim, *Member, IEEE*, Yeong-Taek Lee, Yong Chul Oh, Gyo-Young Jin, Jeihwan Yoo, Donggun Park, *Senior Member, IEEE*, Kyungseok Oh, Changhyun Kim, *Senior Member, IEEE*, and Young-Hyun Jun

CS 150 L12: DRAM

#### Micron 50nm 1-Gbit DDR2 die photo

![](_page_40_Picture_1.jpeg)

CS 150 L12: DRAM

UC Regents Spring 2012 © UCB

#### **Today's Lecture: DRAM**

## **H** DRAM, Xilinx, and You

## **H** DRAM: Bottom-up

## **H** DRAM: Top-down

![](_page_41_Figure_4.jpeg)

![](_page_41_Picture_5.jpeg)

![](_page_42_Picture_0.jpeg)

![](_page_42_Picture_1.jpeg)

512Mb: x4, x8, x16 DDR2 SDRAM Features

![](_page_42_Picture_3.jpeg)

![](_page_43_Figure_0.jpeg)

#### A "bank" of 128 Mb (512Mb chip -> 4 banks)

![](_page_44_Figure_1.jpeg)

![](_page_44_Picture_2.jpeg)

#### **Recall DRAM Challenge #3b: Sensing**

![](_page_45_Figure_1.jpeg)

CS 150 L12: DRAM

#### "Sensing" is row read into sense amps

![](_page_46_Figure_1.jpeg)

CS 150 L12: DRAM

#### An ill-timed refresh may add to latency

![](_page_47_Figure_1.jpeg)

CS 150 L12: DRAM

#### Latency is not the same as bandwidth!

![](_page_48_Figure_1.jpeg)

#### Sadly, it's rarely this good ...

![](_page_49_Figure_1.jpeg)

### **DRAM latency/bandwidth chip features**

Columns: Design the right interface for CPUs to request the subset of a column of data it wishes:

16384 bits delivered by sense amps

Select requested bits, send off the chip

Interleaving: Design the right interface
 to the 4 memory banks on the chip, so
 several row requests run in parallel.

![](_page_50_Picture_5.jpeg)

![](_page_50_Picture_6.jpeg)

![](_page_50_Picture_8.jpeg)

![](_page_50_Picture_9.jpeg)

#### **Off-chip interface for the Micron part ...**

A clocked bus: 200 MHz clock, data transfers on both edges (DDR). Note! This example is **best-case!** To access a new row, a slow ACTIVE command must run before the REAP.

DRAM is controlled via commands (READ, WRITE, REFRESH, ...)

![](_page_51_Picture_4.jpeg)

Synchronous data output.

#### Auto-Precharge **Opening a row before reading** READ

![](_page_52_Figure_1.jpeg)

### However, we can read columns quickly

![](_page_53_Figure_1.jpeg)

Note: This is a "normal read" (not Auto-Precharge). Both READs are to the same bank, but different columns.

![](_page_53_Picture_3.jpeg)

#### Why can we read columns quickly?

![](_page_54_Figure_1.jpeg)

#### Interleave: Access all 4 banks in parallel

![](_page_55_Figure_1.jpeg)

Interleaving: Design the right interface to the 4 memory banks on the chip, so several row requests run in parallel.

Bank b

Can also do other commands on banks concurrently.

Bank c

![](_page_55_Picture_4.jpeg)

Bank a

UC Regents Spring 2012 © UCB

Bank d

#### Only part of a bigger story ...

![](_page_56_Figure_1.jpeg)

![](_page_56_Picture_2.jpeg)

### Only part of a bigger story ...

![](_page_57_Figure_1.jpeg)

#### **DRAM controllers: reorder requests**

#### (A) Without access scheduling (56 DRAM Cycles)

![](_page_58_Figure_2.jpeg)

#### (B) With access scheduling (19 DRAM Cycles)

![](_page_58_Figure_4.jpeg)

**DRAM** Operations:

**P**: bank precharge (3 cycle occupancy)

A: row activation (3 cycle occupancy)

C: column access (1 cycle occupancy)

#### From: Memory Access Scheduling

CS 150 L12: DRAM Scott Rixner<sup>1</sup>, William J. Dally, Ujval J. Kapasi, Peter Mattson, and John D. Owens UC Regents Spring 2012 © UCB

### **Present and Future ...**

![](_page_59_Picture_1.jpeg)

#### MacBook Air ... too thin to use DIMMs

![](_page_60_Picture_1.jpeg)

#### Mainboard: fills about 25% of the laptop

![](_page_61_Picture_1.jpeg)

35 W-h battery: 63% of 2006 MacBook's 55 W-h

#### Core i5: CPU + DRAM controller

![](_page_62_Picture_1.jpeg)

Main board

intel

#### 4GB DRAM soldered to the main board

![](_page_62_Picture_3.jpeg)

#### 3-D memory stack

| DRAM die size            | 10.7 mm × 13.3 mm              |
|--------------------------|--------------------------------|
| DRAM die thickness       | 50 µm                          |
| TSV count in DRAM        | 1,560                          |
| DRAM capacity            | 512 Mbit/die $\times$ 2 strata |
| CMOS logic die size      | 17.5 mm × 17.5 mm              |
| CMOS logic die thickness | 200 µm                         |
| CMOS logic bump count    | 3,497                          |
| CMOS logic process       | 0.18 µm CMOS                   |
| DRAM-logic FTI via pitch | 50 µm                          |
| Package size             | 33 mm × 33 mm                  |
| BGA terminal             | 520 pin / 1mm pitch            |

![](_page_63_Figure_2.jpeg)

![](_page_63_Figure_3.jpeg)

#### A 3D Stacked Memory Integrated on a Logic Device Using SMAFTI Technology

Yoichiro Kurita<sup>1</sup>, Satoshi Matsui<sup>1</sup>, Nobuaki Takahashi<sup>1</sup>, Koji Soejima<sup>1</sup>, Masahiro Komuro<sup>1</sup>, Makoto Itou<sup>1</sup>, Chika Kakegawa<sup>1</sup>, Masaya Kawano<sup>1</sup>, Yoshimi Egawa<sup>2</sup>, Yoshihiro Saeki<sup>2</sup>, Hidekazu Kikuchi<sup>2</sup>, Osamu Kato<sup>2</sup>, Azusa Yanagisawa<sup>2</sup>, Toshiro Mitsuhashi<sup>2</sup>, Masakazu Ishino<sup>3</sup>, Kayoko Shibata<sup>3</sup>, Shiro Uchiyama<sup>3</sup>, Junji Yamada<sup>3</sup>, and Hiroaki Ikeda<sup>3</sup> <sup>1</sup>NEC Electronics, <sup>2</sup>Oki Electric Industry, and <sup>3</sup>Elpida Memory 1120 Shimokuzawa, Sagamihara, Kanagawa 229-1198, Japan y.kurita@necel.com

#### Next week's lectures: Timing ...

![](_page_64_Figure_1.jpeg)