### CS 61C: Great Ideas in Computer Architecture Virtual Memory

### Instructors:

Krste Asanovic & Vladimir Stojanovic http://inst.eecs.berkeley.edu/~cs61c/

### Review

- Programmed I/O
- Polling versus Interrupts
- Asynchronous interrupts versus synchronous traps
- Precise interrupt looks like execution stopped at exactly one instruction, every instruction before finished, no instruction after started.
  - Simplify software view of interrupted state

You Are Here! Software Hardware Parallel Requests Assigned to computer e.g., Search "Katz" Harness Parallel Threads Parallelism & Today Assigned to core Achieve High e.g., Lookup, Ads Performance **Parallel Instructions** Core >1 instruction @ one time e.g., 5 pipelined instructions Input/Outpu Core Parallel Data struction Unit(s) >1 data item @ one time Unit(s) e.g., Add of 4 pairs of words A<sub>0</sub>+B<sub>9</sub>/A<sub>1</sub>+B<sub>3</sub>/A<sub>2</sub>+B<sub>3</sub>/A<sub>3</sub>+B<sub>3</sub> Hardware descriptions Main Memory All gates @ one time **Programming Languages** 

## Traps/Interrupts/Execeptions: altering the normal flow of control $I_{i-1}$ program $I_{i}$ $I_{i+1}$ An external or internal event that needs to be processed by another (system) program. The event is usually unexpected or rare from program's point of view.

### **Terminology**

In CS61C (you'll see other definitions in use elsewhere):

- Interrupt caused by an event external to current running program (e.g. key press, mouse activity)
  - Asynchronous to current program, can handle interrupt on any convenient instruction
- Exception caused by some event during execution of one instruction of current running program (e.g., page fault, illegal instruction)
  - Synchronous, must handle exception on instruction that causes exception
- Trap action of servicing interrupt or exception by hardware jump to "trap handler" code

### **Precise Traps**

- Trap handler's view of machine state is that every instruction prior to the trapped one has completed, and no instruction after the trap has executed.
- Implies that handler can return from an interrupt by restoring user registers and jumping to EPC
  - Interrupt handler software doesn't need to understand the pipeline of the machine, or what program was doing!
- More complex to handle trap caused by an exception
- Providing precise traps is tricky in a pipelined superscalar out-of-order processor!
  - But handling imprecise interrupts in software is even worse.





### Handling Traps in In-Order Pipeline

- Hold exception flags in pipeline until commit point (M stage)
- Exceptions in earlier pipe stages override later exceptions for a given instruction
- Inject external interrupts at commit point (override others)
- If exception/interrupt at commit: update Cause and EPC registers, kill all stages, inject handler PC into fetch stage

**Trap Pipeline Diagram** 

### **Virtual Memory**

















### Where Should Page Tables Reside?

- Space required by the page tables (PT) is proportional to the address space, number of users, ...
  - ⇒ Too large to keep in registers inside CPU
- Idea: Keep PTs in the main memory
  - needs one reference to retrieve the page base address and another to access the data word
    - $\Rightarrow$  doubles the number of memory references!

20



### Administrivia

- Midterm 2 scores up:
  - Regrade request deadline is 23:59:59 on Sunday April 26<sup>th</sup>
- · Clobber Policy:
  - Final composed of MT1, MT2, post-MT2 sections
  - z-scores on MT1/MT2 sections of Final compared to MT1/MT2 grades, will replace if better
- Proj4-1 due date extended to Wed, April 29









### **Atlas Demand Paging Scheme**

- On a page fault:
  - Input transfer into a free page is initiated
  - The Page Address Register (PAR) is updated
  - If no free page is left, a page is selected to be replaced (based on usage)
  - $\boldsymbol{\mathsf{-}}$  The replaced page is written on the drum
    - to minimize drum latency effect, the first empty page on the drum was selected
  - The page table is updated to point to the new location of the page on the drum

27



### Size of Linear Page Table

With 32-bit addresses, 4-KB pages & 4-byte PTEs:

- $\Rightarrow$  2<sup>20</sup> PTEs, i.e, 4 MB page table per user
- ⇒ 4 GB of swap needed to back up full virtual address space

### Larger pages?

- Internal fragmentation (Not all memory in page is used)
- Larger page fault penalty (more time to read from disk)

What about 64-bit virtual address space???

• Even 1MB pages would require 2<sup>44</sup> 8-byte PTEs (35 TB!)

What is the "saving grace"?







Translation Lookaside Buffers (TLB)

Address translation is very expensive!

In a two-level page table, each reference becomes several memory accesses

Solution: Cache translations in TLB

TLB hit ⇒ Single-Cycle Translation

TLB miss ⇒ Page-Table Walk to refill

virtual address VPN offset

VR WD tag PPN (VPN = virtual page number)

(PPN = physical page number)

hit? physical address PPN offset

# TLB Designs Tupically 32-128 entries, usually fully associative Each entry maps a large page, hence less spatial locality across pages more likely that two entries conflict Sometimes larger TLBs (256-512 entries) are 4-8 way setassociative Larger systems sometimes have multi-level (L1 and L2) TLBs Random or FIFO replacement policy No process information in TLB? TLB Reach: Size of largest virtual address space that can be simultaneously mapped by TLB Example: 64 TLB entries, 4KB pages, one page per entry TLB Reach = \_\_\_\_\_?









## VM features track historical uses: Bare machine, only physical addresses One program owned entire machine Batch-style multiprogramming Several programs sharing CPU while waiting for I/O Base & bound: translation and protection between programs (not virtual memory) Problem with external fragmentation (holes in memory), needed occasional memory defragmentation as new jobs arrived Time sharing More interactive programs, waiting for user. Also, more jobs/second. Motivated move to fixed-size page translation and protection, no external fragmentation (but now internal fragmentation, wasted bytes in page) Motivated adoption of virtual memory to allow more jobs to share limited physical memory resources while holding working set in memory Virtual Machine Monitors Run multiple operating systems on one machine Idea from 1970s IBM mainframes, now common on laptops • e.g., run Windows on top of Mac OS X Hardware support for two levels of translation/protection • Guest OS virtual -> Guest OS physical -> Host machine physical Also basis of Cloud Computing • Virtual machine instances for Project 4