### inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures

**Lecture 35 – Virtual Memory II** 

2007-04-16



#### Lecturer SOE Dan Garcia

www.cs.berkeley.edu/~ddgarcia

Hardware repair?! ⇒
This technology allows

you to "patch" your hardware after it has been installed via "Pheonix" - FPGA (field programmable gate array). The bad news: hardware folks can be sloppy & fix later!



CS61C L35 Virtual Memory II (1) technologyreview.com/Infotech/18513 Garcia, Spring 2007 @ U

#### **Review**

- Manage memory to disk? Treat as cache
  - · Included protection as bonus, now critical
  - Use Page Table of mappings for each process vs. tag/data in cache
- Virtual Memory allows protected sharing of memory between processes
- Spatial Locality means Working Set of Pages is all that must be in memory for process to run fairly well



Garcia, Spring 2007 © U

#### Comparing the 2 levels of hierarchy

Cache version Virtual Memory vers.

Block or Line Page

Miss Page Fault

Block Size: 32-64B Page Size: 4K-8KB

Placement: Fully Associative

Direct Mapped,

N-way Set Associative

Replacement: Least Recently Used

LRU or Random (LRU)

Write Thru or Back Write Back

CS61C L35 Virtual Memory II (3)

Garcia, Spring 2007 ® UC

#### **Notes on Page Table**

- Solves Fragmentation problem: all chunks same size, so all holes can be used
- OS must reserve "Swap Space" on disk for each process
- To grow a process, ask Operating System
  - · If unused pages, OS uses them first
  - · If not, OS swaps some old pages to disk
  - (Least Recently Used to pick pages to swap)
- Each process has own Page Table
- Will add details, but Page Table is essence of Virtual Memory



Garcia Spring 2007 © III

#### Why would a process need to "grow"?

- A program's address space contains 4 regions:
  - stack: local variables, grows downward
  - heap: space requested for pointers via malloc(); resizes dynamically, grows upward
  - static data: variables declared outside main, does not grow or shrink
  - code: loaded when program starts, does not change



For now, OS somehow prevents accesses between stack and heap (gray hash lines).

#### **Virtual Memory Problem #1**

- Map every address ⇒ 1 indirection via Page Table in memory per virtual address ⇒ 1 virtual memory accesses = 2 physical memory accesses ⇒ SLOW!
- Observation: since locality in pages of data, there must be locality in virtual address translations of those pages
- Since small is fast, why not use a small cache of virtual to physical address translations to make translation fast?
- For historical reasons, cache is called a Translation Lookaside Buffer, or TLB

CS61C L35 Virtual Memory II (6)

Sarcia. Spring 2007 © UCE









#### What if not in TLB?

- Option 1: Hardware checks page table and loads new Page Table Entry into TLB
- Option 2: Hardware traps to OS, up to OS to decide what to do
  - MIPS follows Option 2: Hardware knows nothing about page table



#### What if the data is on disk?

- We load the page off the disk into a free block of memory, using a DMA transfer (Direct Memory Access – special hardware support to avoid processor)
  - Meantime we switch to some other process waiting to be run
- When the DMA is complete, we get an interrupt and update the process's page table
  - So when we switch back to the task, the desired data will be in memory

CS61C L35 Virtual Memory II (12)

Garcia, Spring 2007 © UC

#### What if we don't have enough memory?

- We chose some other page belonging to a program and transfer it onto the disk if it is dirty
  - · If clean (disk copy is up-to-date), just overwrite that data in memory
  - · We chose the page to evict based on replacement policy (e.g., LRU)
- And update that program's page table to reflect the fact that its memory moved somewhere else
- If continuously swap between disk and memory, called Thrashing

We're done with new material Let's now review w/Questions Cal

#### **Peer Instruction** A. Locality is important yet different for cache and virtual memory (VM): temporal locality for caches but spatial locality for VM ABC 0: FFF 1: FFT 2: FTF Cache management is done by hardware 3: **FTT** (HW), page table management by the 4: TFF operating system (OS), but TLB management is either by HW or OS 5: **TFT** 6: TTF C. VM helps both with security and cost 7: TTT

ia, Spring 2007 © UC













#### And in Conclusion...

- Virtual memory to Physical Memory Translation too slow?
  - Add a cache of Virtual to Physical Address Translations, called a TLB
- Spatial Locality means Working Set of Pages is all that must be in memory for process to run fairly well
- Virtual Memory allows protected sharing of memory between processes with less swapping to disk



annia Parina 2007 © HC

#### **Bonus slides**

- These are extra slides that used to be included in lecture notes, but have been moved to this, the "bonus" area to serve as a supplement.
- The slides will appear in the order they would have in the normal presentation



#### 4 Qs for any Memory Hierarchy

- Q1: Where can a block be placed?
  - · One place (direct mapped)
  - A few places (set associative)
  - · Any place (fully associative)
- Q2: How is a block found?
  - · Indexing (as in a direct-mapped cache)
  - Limited search (as in a set-associative cache)
  - Full search (as in a fully associative cache)
    Separate lookup table (as in a page table)
- Q3: Which block is replaced on a miss?
  - Least recently used (LRU)
  - Random
- Q4: How are writes handled?
  - Write through (Level never inconsistent w/lower)
  - Write back (Could be "dirty", must have dirty bit)



Garcia, Spring 2007 © UC

## Q1: Where block placed in upper level? Block #12 placed in 8 block cache: Fully associative Direct mapped 2-way set associative Set Associative Mapping = Block # Mod # of Sets









Garcia, Spring 2007 © L

# Q2: How is a block found in upper level? Set Select Data Select Direct indexing (using index and block offset), tag compares, or combination Increasing associativity shrinks index, expands tag

#### Q3: Which block replaced on a miss?

- Easy for Direct Mapped
- •Set Associative or Fully Associative:
  - Random
  - · LRU (Least Recently Used)

#### **Miss Rates**

 Associativity:2-way
 4-way
 8-way

 Size
 LRU
 Ran
 LRU
 Ran
 LRU
 Ran

 16 KB
 5.2%
 5.7%
 4.7%
 5.3%
 4.4%
 5.0%

 64 KB
 1.9%
 2.0%
 1.5%
 1.7%
 1.4%
 1.5%

 250 KB
 1.15%
 1.17%
 1.13%
 1.13%
 1.12%
 1.12%

#### Q4: What to do on a write hit?

- Write-through
  - update the word in cache block and corresponding word in memory
- Write-back
  - · update word in cache block
  - · allow memory word to be "stale"
  - => add 'dirty' bit to each line indicating that memory be updated when block is replaced
  - => OS flushes cache before I/O !!!
- Performance trade-offs?
  - · WT: read misses cannot result in writes
- **Coo** WB: no writes of repeated writes

Carrie Parine 2007 © HC

#### **Three Advantages of Virtual Memory**

#### 1) Translation:

- Program can be given consistent view of memory, even though physical memory is scrambled
- · Makes multiple processes reasonable
- Only the most important part of program ("Working Set") must be in physical memory
- Contiguous structures (like stacks) use only as much physical memory as necessary yet still grow later



Garcia, Spring 2007 © UCB

#### **Three Advantages of Virtual Memory**

#### 2) Protection:

- Different processes protected from each other
- Different pages can be given special behavior
   (Read Only, Invisible to user programs, etc).
- Kernel data protected from User programs
- Very important for protection from malicious programs ⇒ Far more "viruses" under Microsoft Windows
- Special Mode in processor ("Kernel mode") allows processor to change page table/TLB

#### 3) Sharing:

 Can map same physical page to multiple users ("Shared memory")



Garcia, Spring 2007 © UG

#### Why Translation Lookaside Buffer (TLB)?

- Paging is most popular implementation of virtual memory (vs. base/bounds)
- Every paged virtual memory access must be checked against Entry of Page Table in memory to provide protection / indirection
- Cache of Page Table Entries (TLB) makes address translation possible without memory access in common case to make fast



Caraia Carina 2007 © HCI

#### **Bonus slide: Virtual Memory Overview (1/4)**

- User program view of memory:
  - Contiguous
  - · Start from some set address
  - Infinitely large
  - · Is the only running program

#### Reality:

- · Non-contiguous
- · Start wherever available memory is
- · Finite size
- · Many programs running at a time

Sarcia, Spring 2007 © U

#### **Bonus slide: Virtual Memory Overview (2/4)**

- Virtual memory provides:
  - · illusion of contiguous memory
  - · all programs starting at same set address
  - illusion of ~ infinite memory (2<sup>32</sup> or 2<sup>64</sup> bytes)
  - protection



Garcia, Spring 2007 © UC

#### **Bonus slide: Virtual Memory Overview (3/4)**

- Implementation:
  - · Divide memory into "chunks" (pages)
  - Operating system controls page table that maps virtual addresses into physical addresses
  - Think of memory as a cache for disk
  - TLB is a cache for the page table



rcia Spring 2007 © H

#### **Bonus slide: Virtual Memory Overview (4/4)**

- Let's say we're fetching some data:
  - · Check TLB (input: VPN, output: PPN)
    - hit: fetch translation
    - miss: check page table (in memory)
      - · Page table hit: fetch translation
      - Page table miss: page fault, fetch page from disk to memory, return translation to TLB
  - · Check cache (input: PPN, output: data)
    - hit: return value
    - miss: fetch value from memory



Garcia, Spring 2007 © UC

