# CS 150 Digital Design

**Lecture 13 – DRAM** 

2011-3-1

John Wawrzynek

today's lecturer: John Lazzaro

TAs: Michael Eastham and Austin Doupnik

www-inst.eecs.berkeley.edu/~cs150/



#### **Today's Lecture: DRAM**





RAM: Bottom-up



DRAM: Top-down





#### **DDR2 SO-DIMM on ML505 Board**

DDR2: Double-Data Rate, 2nd generation



SO-DIMM: Small-Outline, Dual Inline Memory Module

**CS 150 L13: DRAM** 



#### **DDR2 SO-DIMM Module**



#### Project controller: Xilinx-supplied IP

Your project's
Verilog code sees
a FIFO R/W
interface.

Xilinx IP translates
FIFO requests to
DRAM commands.

DDR2 SO-DIMM



#### **Today's Lecture: DRAM**





RAM: Top-down

To understand the DRAM controller, you need to understand how a DRAM chip works. Otherwise, it just seems like magic.



## Capacitance



#### Recall: Building a capacitor



Conducts electricity well. (metal, doped polysilicon)

An insulator. Does not conducts electricity at all. (air, glass (silicon dioxide))

Conducts electricity well (metal, doped polysilicon)



#### **Recall: Capacitors in action**

Because the dielectric is an insulator, and does not conduct.



After circuit "settles" ...

Q = C V = C \* 1.5 Volts (D cell)

Q: Charge stored on capacitor

C: The capacitance of the device: function of device shape and type of dielectric.

After battery is removed:

Still, Q = C \* 1.5 Volts



Capacitor "remembers" charge

CS 150 L13: DRAN



#### Storing computational state as charge

State is coded as the amount of energy stored by a device.





State is read by sensing the amount of energy

Problems: noise changes Q (up or down),
parasitics leak or source Q. Fortunately,
Q cannot change instantaneously, but that only
cs 150 L13: DRAM gets us in the ballpark.
UC Regents Spring 2011 © UCB

### **MOS Transistors**

Two diodes and a capacitor in an interesting arrangement. So, we begin with a diode review ...



#### **Diodes in action ...**



Light emitting diode (LED)

Light on?



+

Yes!





Light on?

No!



**CS 150 L13: DRAM** 

#### **Diodes: Current vs Voltage**



## I = lo [exp(V/Vo) - 1]



Vo range: 25mV to 60 mV





#### Note: IC Diodes are biased "off"!



V1, V2 > 0V. Diodes "off", only current is lo "leakage". I = lo [exp(V/Vo) - 1]

Anodes of all diodes on wafer connected to ground.



### **MOS Transistors**

Two diodes and a capacitor in an interesting arrangement ...



#### What we want: the perfect switch.



#### **An n-channel MOS transistor (nFET)**



Polysilicon gate, dielectric, and substrate form a capacitor.

nFet is off (I is "leakage")



Vg = 1V, small region near the surface turns from p-type to n-type.

nFet is on.

#### Mask set for an n-Fet (circa 1986)



#### Masks

#1: n+ diffusion

#2: poly (gate)

**#3**: diff contact

#4: metal

Layers to do p-Fet not shown. Modern processes have 6 to 10 metal layers (or more) (in 1986: 2).

## **Dynamic Memory Cells**



#### **Recall: Capacitors in action**

Because the dielectric is an insulator, and does not conduct.



After circuit "settles" ...

Q = C V = C \* 1.5 Volts (D cell)

Q: Charge stored on capacitor

C: The capacitance of the device: function of device shape and type of dielectric.

After battery is removed:

Still, Q = C \* 1.5 Volts



Capacitor "remembers" charge

CS 150 L13: DRAN



#### **DRAM cell: 1 transistor, 1 capacitor**







**CS 150 L13: DRAM** 

#### Invented after SRAM, by Robert Dennard

#### United States Patent Office

3,387,286
Patented June 4, 1968

1

3,387,286
FIELD-EFFECT TRANSISTOR MEMORY
Robert H. Dennard, Croton-on-Hudson, N.Y., assignor to
International Business Machines Corporation, Armonk,
N.Y., a corporation of New York
Filed July 14, 1967, Ser. No. 653,415
21 Claims. (Cl. 340—173)

2

tinent in disclosing various concepts and structures which have been developed in the application of field-effect transistors to different types of memory applications, the primary thrust up to this time in conventional read-write random access memories has been to connect a plurality of field-effect transistors in each cell in a latch configuration. Memories of this type require a large number of active devices in each cell and therefore each cell re-





#### **DRAM Circuit Challenge #1: Writing**



Vdd - Vth. Bad, we store less charge. Why do we not get Vdd?

Ids =  $[(\mu \in W)/(2LD)]$  [Vgs -Vth]^2, but "turns off" when Vgs <= Vth!

Vgs = Vdd - Vc. When Vdd - Vc == Vth, charging effectively stops!

**CS 150 L13: DRAM** 

#### **DRAM Challenge #2: Destructive Reads**





### **DRAM Circuit Challenge #3a: Sensing**



Assume Ccell = 1 fF

Bit line may have 2000 nFet drains, assume bit line C of 100 fF, or 100\*Ccell.

Ccell holds Q = Ccell\*(Vdd-Vth)

100\*Ccell Ccell

When we dump this charge onto the bit line, what voltage do we see?

dV = [Ccell\*(Vdd-Vth)] / [100\*Ccell]

 $dV = (Vdd-Vth) / 100 \approx tens of millivolts!$ 



In practice, scale array to get a 60mV signal.

CS 150 L13: DRAM









#### **Recall: Process Scaling**



Recall process scaling ("Moore's Law")



Pue to reducing V and C (length and width of Cs decrease, but plate distance gets smaller).

Recent slope more shallow because V is being scaled less aggressively.

From: "Facing the Hot Chips Challenge Again", Bill Holt, Intel, presented at Hot Chips 17, 2005.

CS 150 L13: DRAM

UC Regents Spring 2011 © UCB

### **DRAM Challenge 7: Scaling**





If Ccell and drain capacitances scale together, number of bits per bit line stays constant.

 $dV \approx 60 \text{ mV} = [Ccell*(Vdd-Vth)] / [100*Ccell]$ 

Problem 1: Number of arrays per chip grows!

Problem 2: Vdd may need to scale down too!



Solution: Constant Innovation of Cell Capacitors!

CS 150 L13: DRAM

#### **Poly-diffusion Ccell is ancient history**







Word Line and Vdd run on "z-axis"

**CS 150 L13: DRAM** 

## Early replacement: "Trench" capacitors



#### Figure 4

SEM photomicrograph of 0.25- $\mu$ m trench DRAM cell suitable for scaling to  $0.15\mu$ m and below. Reprinted with permission from [17]; © 1995 IEEE.



#### Final generation of trench capacitors



The companies that kept scaling trench capacitors for commodity DRAM chips went out of business.



#### Modern cells: "stacked" capacitors





Micron 1-Gbit DDR2 50-nm SDRAM

CS 150 L13: DRAM

#### In the labs: Vertical cell transistors ...





880

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 45, NO. 4, APRIL 2010

# A 31 ns Random Cycle VCAT-Based 4F<sup>2</sup> DRAM With Manufacturability and Enhanced Cell Efficiency

Ki-Whan Song, Jin-Young Kim, Jae-Man Yoon, Sua Kim, Huijung Kim, Hyun-Woo Chung, Hyungi Kim, Kanguk Kim, Hwan-Wook Park, Hyun Chul Kang, Nam-Kyun Tak, Dukha Park, Woo-Seop Kim, *Member, IEEE*, Yeong-Taek Lee, Yong Chul Oh, Gyo-Young Jin, Jeihwan Yoo, Donggun Park, *Senior Member, IEEE*, Kyungseok Oh, Changhyun Kim, *Senior Member, IEEE*, and Young-Hyun Jun

**CS 150 L13: DRAM** 

# Micron 50nm 1-Gbit DDR2 die photo





#### **Today's Lecture: DRAM**



RAM: Bottom-up

RAM: Top-down





# **Memory Arrays**



512Mb: x4, x8, x16 DDR2 SDRAM Features





# A "bank" of 128 Mb (512Mb chip -> 4 banks)



In reality, 16384 columns are divided into 64 smaller arrays.







## "Sensing" is row read into sense amps

1 0 13-bit row address 8 input 9 2 d e C 0 d e r Slow! This 2.5ns period DRAM (400 MT/s) can do row reads at only 55 ns (18 MHz).

DRAM has high latency to first bit out. A fact of life.



Select requested bits, send off the chip





### Latency is not the same as bandwidth!



### Sadly, it's rarely this good ...

What if we want all of the 16384 bits?

The "we" for a CPU would be the program running on the CPU.

Recall Amdalh's law: If 20% of the memory accesses need a new row access ... not good.





13-bit

1

0

CS 150 L13: DRAM

#### **DRAM latency/bandwidth chip features**



Columns: Design the right interface for CPUs to request the subset of a column of data it wishes:

16384 bits delivered by sense amps



Select requested bits, send off the chip



Interleaving: Design the right interface to the 4 memory banks on the chip, so several row requests run in parallel.

Bank 1

Bank 2

Bank 3

Bank 4



### Off-chip interface for the Micron part ...

A clocked bus: 200 MHz clock, data transfers on both edges (DDR).

Note! This example is best-case!
To access a new row, a slow ACTIVE command must run before the READ.



DRAM is controlled via commands (READ, WRITE, REFRESH, ...)

Synchronous data output.



**CS 150 L13: DRAM** 

#### Auto-Precharge Opening a row before reading ... T3 T6 T7n T8 T8n CK# $t_{CK}$ $t_{CH}$ CKE Command<sup>1</sup> NOP<sup>1</sup> $\langle \text{READ}^{2,3} \rangle$ NOP<sup>1</sup> NOP<sup>1</sup> Col n Bank address X Bank x **t**RTP $t_{RP}$ tRAS $t_{RC}$ DQS, DQS# t<sub>LZ</sub> (MIN) → t<sub>AC (MIN)</sub> t<sub>HZ (MIN)</sub> 55 ns between row opens.

UC Regents Spring 2011 © UCB

CS 150 L13: DRAM

### However, we can read columns quickly



Note: This is a "normal read" (not Auto-Precharge). Both READs are to the same bank, but different columns.



### Why? Reading "delivered bits" is fast.



Column reads select from the 16384 bits here



16384 bits delivered by sense amps

Select requested bits, send off the chip



#### Interleave: Access all 4 banks in parallel





Interleaving: Design the right interface to the 4 memory banks on the chip, so several row requests run in parallel.











Can also do other commands on banks concurrently.

**CS 150 L13: DRAM** 

# Only part of a bigger story ...





### Only part of a bigger story ...



#### **DRAM** controllers: reorder requests

#### (A) Without access scheduling (56 DRAM Cycles)



#### (B) With access scheduling (19 DRAM Cycles)



#### **DRAM Operations:**

**P**: bank precharge (3 cycle occupancy)

**A**: row activation (3 cycle occupancy)

C: column access (1 cycle occupancy)





#### **Memory Access Scheduling**

# **Present and Future ...**



#### Intel Sandy Bridge: IDF 2010

All on chip:

4 x86 cores

**GPU** 

North Bridge

DRAM controller



On chip ring network



#### The end of DIMM

| DRAM die size            | 10.7 mm × 13.3 mm       |
|--------------------------|-------------------------|
| DRAM die thickness       | 50 μm                   |
| TSV count in DRAM        | 1,560                   |
| DRAM capacity            | 512 Mbit/die × 2 strata |
| CMOS logic die size      | 17.5 mm × 17.5 mm       |
| CMOS logic die thickness | 200 μm                  |
| CMOS logic bump count    | 3,497                   |
| CMOS logic process       | 0.18 μm CMOS            |
| DRAM-logic FTI via pitch | 50 μm                   |
| Package size             | 33 mm × 33 mm           |
| BGA terminal             | 520 pin / 1mm pitch     |



# 1 Gbit stacked DRAM with TSV (512 Mbit × 2 strata)





#### A 3D Stacked Memory Integrated on a Logic Device Using SMAFTI Technology

Yoichiro Kurita<sup>1</sup>, Satoshi Matsui<sup>1</sup>, Nobuaki Takahashi<sup>1</sup>, Koji Soejima<sup>1</sup>, Masahiro Komuro<sup>1</sup>, Makoto Itou<sup>1</sup>, Chika Kakegawa<sup>1</sup>, Masaya Kawano<sup>1</sup>, Yoshimi Egawa<sup>2</sup>, Yoshihiro Saeki<sup>2</sup>, Hidekazu Kikuchi<sup>2</sup>, Osamu Kato<sup>2</sup>, Azusa Yanagisawa<sup>2</sup>, Toshiro Mitsuhashi<sup>2</sup>, Masakazu Ishino<sup>3</sup>, Kayoko Shibata<sup>3</sup>, Shiro Uchiyama<sup>3</sup>, Junji Yamada<sup>3</sup>, and Hiroaki Ikeda<sup>3</sup>

NEC Electronics, <sup>2</sup>Oki Electric Industry, and <sup>3</sup>Elpida Memory

1120 Shimokuzawa, Sagamihara, Kanagawa 229-1198, Japan

y.kurita@necel.com g 2011 © UCB

# Thursday: JohnW returns ...

Thu 3/3 Lec #14: Video interface and framebuffers

