## inst.eecs.berkeley.edu/~cs61c/su06 CS61C : Machine Structures

Lecture \#24: VM II


2006-08-09
Andy Carle
CS 61C L24 VM II (1)

## Address Mapping: Page Table

## Virtual Address:



Page Table located in physical memory

## Page Table

- A page table: mapping function
- There are several different ways, all up to the operating system, to keep this data around.
- Each process running in the operating system has its own page table
- Historically, OS changes page tables by changing contents of Page Table Base Register


## Requirements revisited

-Remember the motivation for VM:

- Sharing memory with protection
- Different physical pages can be allocated to different processes (sharing)
- A process can only touch pages in its own page table (protection)
- Separate address spaces
- Since programs work only with virtual addresses, different programs can have different data/code at the same address!


## Page Table Entry (PTE) Format

- Contains either Physical Page Number or indication not in Main Memory
- OS maps to disk if Not Valid ( $\mathrm{V}=0$ )

Page Table

| ... |  |  |
| :---: | :---: | :---: |
| V | A.R. | P. P.N. |
| Val | Access | Physical |
| -id | Rights | Page <br> Number |
| V | A.R. | P. P. N. |

- If valid, also check if have permission to use page: Access Rights (A.R.) may be Read Only, Read/Write, Executable


## Paging/Virtual Memory Multiple Processes

User A:
Virtual Memory


CS 61C L24 VM II (6)

User B:
Virtual Memory $\infty$ Stack Memory

## Comparing the 2 levels of hierarchy

Cache Version
Block or Line
Miss
Block Size: 32-64B
Placement:
Direct Mapped,
N-way Set Associative

Replacement:
LRU or Random

Virtual Memory vers. Page

Page Fault
Page Size: 4K-8KB
Fully Associative

## WYrite Thru or Back Write Back

## Notes on Page Table

- OS must reserve "Swap Space" on disk for each process
- To grow a process, ask Operating System
- If unused pages, OS uses them first
- If not, OS swaps some old pages to disk
- (Least Recently Used to pick pages to swap)
- Will add details, but Page Table is essence of Virtual Memory


## VM Problems and Solutions

- TLB
- Paged Page Tables


## Virtual Memory Problem \#1

- Map every address $\Rightarrow 1$ indirection via Page Table in memory per virtual address $\Rightarrow 1$ virtual memory accesses = 2 physical memory accesses $\Rightarrow$ SLOW!
- Observation: since locality in pages of data, there must be locality in virtual address translations of those pages
- Since small is fast, why not use a small cache of virtual to physical address translations to make translation fast?
- For historical reasons, cache is called a Translation Lookaside Buffer, or TLB


## Translation Look-Aside Buffers (TLBs)

-TLBs usually small, typically 32-256 entries

- Like any other cache, the TLB can be direct mapped, set associative, or fully associative


On TLB miss, get page table entry from main memory

## Typical TLB Format



- TLB just a cache on the page table mappings
- TLB access time comparable to cache (much less than main memory access time)
- Dirty: since use write back, need to know whether or not to write page to disk when replaced -Ref: Used to help calculate LRU on replacement - Cleared by OS periodically, then checked to see if page was referenced


## What if not in TLB?

- Option 1: Hardware checks page table and loads new Page Table Entry into TLB
- Option 2: Hardware traps to OS, up to OS to decide what to do
- MIPS follows Option 2: Hardware knows nothing about page table


## What if the data is on disk?

- We load the page off the disk into a free block of memory, using a DMA (Direct Memory Access - very fast!) transfer
- Meantime we switch to some other process waiting to be run
- When the DMA is complete, we get an interrupt and update the process's page table
- So when we switch back to the task, the desired data will be in memory


## What if we don't have enough memory?

- We choose some other page belonging to a program and transfer it onto the disk if it is dirty
- If clean (disk copy is up-to-date), just overwrite that data in memory
- We chose the page to evict based on replacement policy (e.g., LRU)
- And update that program's page table to reflect the fact that its memory moved somewhere else
- If continuously swap between disk and memory, called Thrashing


## Question

-Why is the TLB so small yet so effective?

- Because each entry corresponds to pagesize \# of addresses
- Why does the TLB typically have high associativity? What is the "associativity" of VA $\rightarrow$ PA mappings?
- Because the miss penalty dominates the AMAT for VM.
- High associativity $\rightarrow$ lower miss rates.
- VPN $\rightarrow$ PPN mappings are fully associative


## Virtual Memory Problem \#1 Recap

- Slow:
- Every memory access requires:
- 1 access to PT to get VPN->PPN translation
- 1 access to MEM to get data at PA
-Solution:
- Cache the Page Table
- Make common case fast
- PT cache called "TLB"
- "block size" is just 1 VPN->PN mapping
- TLB associativity


## Virtual Memory Problem \#2

- Page Table too big!
- 4GB Virtual Memory $\div 1$ KB page $\Rightarrow \sim 4$ million Page Table Entries $\Rightarrow 16$ MB just for Page Table for 1 process, 8 processes $\Rightarrow 256$ MB for Page Tables!
- Spatial Locality to the rescue
- Each page is 4 KB , lots of nearby references
- But large page size wastes resources
-Pages in program's working set will exhibit temporal and spatial locality.
-So ...


## Solutions

- Page the Page Table itself!
- Works, but must be careful with neverending page faults
- Pin some PT pages to memory
-2-level page table
- Solutions tradeoff in-memory PT size for slower TLB miss
- Make TLB large enough, highly associative so rarely miss on address translation
- CS 162 will go over more options and in greater depth


## Page Table Shrink :

- Single Page Table

| Page Number | Offset |
| :---: | :---: |
| 20 bits 12 bits |  |

- Multilevel Page Table

| Super Page No. | Page Number | Offset |
| :---: | :---: | :---: |
| 10 bits | 10 bits | 12 bits |

- Only have second level page table for valid entries of super level page table
- Book Exercises explore exact space savings


## Administrivia

- Proj 4 Out, Due next week
-HW 78, soon
- Final, next Friday


## 2-level Page Table



## Three Advantages of Virtual Memory

1) Translation:

- Program can be given consistent view of memory, even though physical memory is scrambled (illusion of contiguous memory)
- All programs starting at same set address
- Illusion of ~ infinite memory ( $2^{32}$ or $2^{64}$ bytes)
- Makes multiple processes reasonable
- Only the most important part of program ("Working Set") must be in physical memory
- Contiguous structures (like stacks) use only as much physical memory as necessary yet still grow later


## Cache, Proc and VM in IF (A Fine Slide)

Fetch PC


## Cache, Proc and VM in IF (A Fine Slide)

Fetch PC


## \$\&VM Review: 4 Qs for any Mem. Hierarchy

- Q1: Where can a block be placed in the upper level? (Block placement)
- Q2: How is a block found if it is in the upper level? (Block identification)
- Q3: Which block should be replaced on a miss? (Block replacement)
- Q4: What happens on a write? (Write strategy)


## Q1: Where block placed in upper level?

- Block 12 placed in 8 block cache:
- Fully associative, direct mapped, 2-way set associative
-S.A. Mapping = Block Number Mod Number Sets

Block

no.


Fully associative: block 12 can go anywhere


Direct mapped: block 12 can go only into block 4 (12 mod 8)
Block-frame address

Block 01234567
no.


Set Set Set Set $\begin{array}{llll}0 & 1 & 2\end{array}$ Set associative: block 12 can go anywhere in set 0 $(12 \bmod 4)$


1111111111222222222233
no.

$$
01234567890123456789012345678901
$$

## Q2: How is a block found in upper level?



- Direct indexing (using index and block offset), tag compares, or combination
- Increasing associativity shrinks index, expands tag


## Q3: Which block replaced on a miss?

-Easy for Direct Mapped
-Set Associative or Fully Associative:

- Random
- LRU (Least Recently Used)

Miss Rates
Associativity:2-way 4-way 8-way
Size LRU Ran LRU Ran LRU Ran

64 KB $\quad 1.9 \% \quad 2.0 \% \quad 1.5 \% \quad 1.7 \% \quad 1.4 \% \quad 1.5 \%$
25 KB 1.15\% 1.17\% 1.13\% 1.13\% 1.12\% 1.12\%
CS 61C L24 VM II (29)

## Q4: What to do on a write hit?

- Write-through
- update the word in cache block and corresponding word in memory
- Write-back
- update word in cache block
- allow memory word to be "stale"
=> add 'dirty' bit to each line indicating that memory be updated when block is replaced
=> OS flushes cache before I/O !!!
-Performance trade-offs?
-WT: read misses cannot result in writes

Ces

## Peer Instruction (1/3)

-40-bit virtual address, 16 KB page
Virtual Page Number (? bits) $\quad$ Page Offset (? bits)
-36-bit physical address
Physical Page Number (? bits) Page Offset (? bits)

- Number of bits in Virtual Page Numberl Page offset, Physical Page Number/Page offset?

```
1: 22/18 (VPN/PO), 22/14 (PPN/PO)
2: 24/16, \(20 / 16\)
3: 26/14, 22/14
4: 26/14, 26/10
5: 28/12, 24/12
```


## Peer Instruction (1/3) Answer

-40- bit virtual address, 16 KB ( $2^{14} \mathrm{~B}$ )
Virtual Page Number (26 bits) $\quad$ Page Offset (14 bits)
-36- bit virtual address, 16 KB ( $2^{14} \mathrm{~B}$ )
Physical Page Number (22 bits) $\quad$ Page Offset (14 bits)

- Number of bits in Virtual Page Numberl Page offset, Physical Page Number/Page offset?

```
1: 22/18 (VPN/PO), 22/14 (PPN/PO)
2: 24/16, 20/16
3: 26/14, 22/14
4: 26/14, 26/10
5: 28/12, 24/12
```


## Peer Instruction (2/3): 40b VA, 36b PA

-2-way set-assoc. TLB, 256 "slots", 40b VA: | TLB Tag (? bits) | TLB Index (? bits) | Page Offset (14 bits) |
| :--- | :--- | :--- |

-TLB Entry: Valid bit, Dirty bit, Access Control (say 2 bits), Virtual Page Number, Physical Page Number

\section*{| V | D | Access (2 bits) | TLB Tag (? bits) | Physical Page No. (? bits) |
| :--- | :--- | :--- | :--- | :--- |}

- Number of bits in TLB Tag / Index / Entry?

$$
\begin{aligned}
& \text { 1: } 12 \mid 14 \text { | } 38 \text { (TLB Tag / Index / Entry) } \\
& \text { 2: } 14 \text { | } 12 \text { | } 40 \\
& \text { 3: } 18 \text { | } 84 \\
& \text { 4: } 18 \text { | } 8 \text { | } 58
\end{aligned}
$$

## Peer Instruction (2/3) Answer

-2-way set-assoc data cache, $256\left(2^{8}\right)$ "slots", 2 TLB entries per slot => 8 bit index

\section*{| TLB Tag (18 bits) | TLB Index (8 bits) | Page Offset (14 bits) |
| :--- | :--- | :--- |}

Virtual Page Number (26 bits)
-TLB Entry: Valid bit, Dirty bit, Access Control (2 bits), Virtual Page Number, Physical Page Number

| V | D | Access (2 bits) | TLB Tag (18 bits) |
| :--- | :--- | :--- | :--- |

1: 12 | 14 / 38 (TLB Tag / Index / Entry)

| 2: $14 / 12 / 40$ |
| :---: | :---: |
| 3: $18 / 8 / 44$ |
| $4: 18 / 8 / 58$ |

## Peer Instruction (3/3)

- 2-way set-assoc, 64KB data cache, 64B block Cache Tag (? bits) Cache Index (? bits) Block Offset (? bits)

Physical Page Address (36 bits)

- Data Cache Entry: Valid bit, Dirty bit, Cache tag + ? bits of Data

| V | D | Cache Tag (? bits) |
| :--- | :--- | :--- |

Cache Data (? bits)

- Number of bits in Data cache Tag / Index I Offset / Entry?
1: 12 | 9 | 14 | 87 (Tag/Index/Offset/Entry)
2: $20 / 10 / 6 / 86$
3: 20/10/ 6/534
4: 21| 9| 6/87
5: 21 / 9/ 6/535


## Peer Instruction (3/3) Answer

- 2-way set-assoc data cache, $64 \mathrm{~K} / 1 \mathrm{~K}\left(2^{10}\right)$ "slots", 2 entries per slot => 9 bit index

Cache Tag (21 bits) Cache Index (9 bits) $\quad$ Block Offset (6 bits)
Physical Page Address ( 36 bits)

- Data Cache Entry: Valid bit, Dirty bit, Cache tag + 64 Bytes of Data


```
1: 12 | 9 | 14 | 87 (Tag/Index/Offset/Entry)
2: 20 | 10 | 6 | 86
3: 20 | 10 | 6 | 534
4: 21/ 9/ 6/87
5: 21 / 9 | 6 / 535```

