# CS162 Operating Systems and Systems Programming Lecture 13

# Address Translation (con't) Caches and TLBs

October 13, 2010 Prof. John Kubiatowicz http://inst.eecs.berkeley.edu/~cs162

## Review: Example of segment translation

| 0x240          | main:            | la \$    | a0, varx                               | -          |        |        |
|----------------|------------------|----------|----------------------------------------|------------|--------|--------|
| 0x244          |                  | jal      | strlen                                 | Seg ID #   | Base   | Limit  |
|                |                  |          |                                        | 0 (code)   | 0x4000 | 0x0800 |
| 0x360<br>0x364 | strlen:<br>loop: | li<br>lb | <pre>\$v0, 0 ;count \$t0, (\$a0)</pre> | 1 (data)   | 0x4800 | 0x1400 |
| 0x368          |                  | beq      | \$r0,\$t1, done                        | 2 (shared) | 0xF000 | 0x1000 |
| <br>0x4050     |                  | <br>dw   | 0x314159                               | 3 (stack)  | 0x0000 | 0x3000 |
| 024050         | varx             | aw       | 0X314159                               | <u>e</u>   |        |        |

Let's simulate a bit of this code to see what happens (PC=0x240):

- Fetch 0x240. Virtual segment #? 0; Offset? 0x240 Physical address? Base=0x4000, so physical addr=0x4240 Fetch instruction at 0x4240. Get "la \$a0, varx" Move 0x4050 → \$a0. Move PC+4→PC
- Fetch 0x244. Translated to Physical=0x4244. Get "jal strlen" Move 0x0248 → \$ra (return address!), Move 0x0360 → PC
- Fetch 0x360. Translated to Physical=0x4360. Get "li \$v0,0" Move 0x0000 → \$v0, Move PC+4→PC
- Fetch 0x364. Translated to Physical=0x4364. Get "lb \$t0,(\$a0)" Since \$a0 is 0x4050, try to load byte from 0x4050 Translate 0x4050. Virtual segment #? 1; Offset? 0x50 Physical address? Base=0x4800, Physical addr = 0x4850, Load Byte from 0x4850→\$t0, Move PC+4→PC
- 10/13/10

Kubiatowicz CS162 ©UCB Fall 2010

Lec 13.2

# Review: Multi-level Translation

- What about a tree of tables?
  - Lowest level page table⇒memory still allocated with bitmap
  - Higher levels often segmented
- Could have any number of levels. Example (top segment):



- Pointer to top-level table (page table)



Lec 13.3

#### Goals for Today What is in a PTE? • What is in a Page Table Entry (or PTE)? Finish discussion of both Address Translation and - Pointer to next-level page table or to actual page Protection - Permission bits: valid, read-only, read-write, write-only • Example: Intel x86 architecture PTE: Cachina and TLBs - Address same format previous slide (10, 10, 12-bit offset) - Intermediate page tables called "Directories" Page Frame Number Free (Physical Page Number) (OS)11-9 8 7 6 5 4 3 2 1 0 31-12 P: Present (same as "valid" bit in other architectures) W: Writeable U: User accessible PWT: Page write transparent: external cache write-through PCD: Page cache disabled (page cannot be cached) A: Accessed: page has been accessed recently Note: Some slides and/or pictures in the following are D: Dirty (PTE only): page has been modified recently adapted from slides ©2005 Silberschatz, Galvin, and Gagne L: $L=1 \Rightarrow 4MB$ page (directory only). Bottom 22 bits of virtual address serve as offset 10/13/10 Kubiatowicz CS162 ©UCB Fall 2010 Lec 13.5 10/13/10 Kubiatowicz CS162 ©UCB Fall 2010 Lec 13.6 Examples of how to use a PTE How is the translation accomplished? • How do we use the PTE? Virtual Physical Addresses - Invalid PTE can imply different things: Addresses MMU » Region of address space is actually invalid or » Page/directory is just somewhere else than memory - Validity checked first • What, exactly happens inside MMU? » OS can use other (say) 31 bits for location info • One possibility: Hardware Tree Traversal • Usage Example: Demand Paging - For each virtual address, takes page table base pointer - Keep only active pages in memory and traverses the page table in hardware - Place others on disk and mark their PTEs invalid - Generates a "Page Fault" if it encounters invalid PTE • Usage Example: Copy on Write » Fault handler will decide what to do - UNIX fork gives copy of parent address space to child » Address spaces disconnected after child created » More on this next lecture - Pros: Relatively fast (but still many memory accesses!) - How to do this cheaply? » Make copy of parent's page tables (point at same memory) - Cons: Inflexible, Complex hardware » Mark entries in both sets of page tables as read-only Another possibility: Software » Page fault on write creates two copies • Usage Example: Zero Fill On Demand - Each traversal done in software - New data pages must carry no information (say be zeroed) - Pros: Very flexible - Mark PTEs as invalid; page fault on use gets zeroed page - Cons: Every translation must invoke Fault! - Often, OS creates zeroed pages in background • In fact, need way to cache translations for either case! 10/13/10 Kubiatowicz CS162 ©UCB Fall 2010 Lec 13.7 10/13/10 Kubiatowicz CS162 ©UCB Fall 2010 Lec 13.8

### **Dual-Mode Operation**

|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |                                                                                                                                                                                                                                                                                                                     |                               |                                                                                            | <b>J</b>                                                                                                                                                                                                                                                                                                                                                                                           |           |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------|--------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|
| - If it could, co<br>- Has to be rest<br>• To Assist with f                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | Modify its own translation<br>uld get access to all of phys<br>pricted somehow<br>Protection, Hardware provi<br>(Dual-Mode Operation):                                                                                                                                                                              | ical memory                   | process?<br>- Allocat                                                                      | es the kernel do to create a new us<br>re and initialize address-space control l<br>program off disk and store in memory                                                                                                                                                                                                                                                                           |           |
| <ul> <li>"Kernel" mode</li> <li>"User" mode (I</li> <li>Mode set with accessible in k</li> <li>Intel processor protection:</li> <li>PL (Priviledge I</li> <li>» PLO has full</li> <li>Privilege Level</li> <li>Mirrored "IOP permission to protection to present the set of the set</li></ul> | (or "supervisor" or "protecte<br>Normal program mode)<br>bits in special control regist<br>ernel-mode<br>actually has four "rings" o<br>Level) from 0 - 3<br>access, PL3 has least<br>set in code segment descrip<br>L" bits in condition register<br>programs to use the I/O inst<br>mels on Intel processors only | tor (CS)<br>gives<br>ructions | » Poin<br>» Pose<br>- Run Pr<br>» Set<br>» Set<br>» Set<br>» Jun<br>∙ How does<br>- Same s | te and initialize translation table<br>at at code in memory so program can execu<br>sibly point at statically initialized data<br>ogram:<br>machine registers<br>hardware pointer to translation table<br>processor status word for user mode<br>p to start of program<br>s kernel switch between processes?<br>saving/restoring of registers as before<br>restore PSL (hardware pointer to transl |           |
| 10/13/10 Ku                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | biatowicz CS162 ©UCB Fall 2010                                                                                                                                                                                                                                                                                      | Lec 13.9                      | 10/13/10                                                                                   | Kubiatowicz CS162 ©UCB Fall 2010                                                                                                                                                                                                                                                                                                                                                                   | Lec 13.10 |

## User→Kernel (System Call)

- · Can't let inmate (user) get out of padded cell on own
  - Would defeat purpose of protection!
  - So, how does the user program get back into kernel?



- System call: Voluntary procedure call into kernel
  - Hardware for controlled User-Kernel transition
  - Can any kernel routine be called? » No! Only specific ones.
  - System call ID encoded into system call instruction » Index forces well-defined interface with kernel

#### 10/13/10

#### Lec 13,11

- » Kernel has different view of memory than user
- Every Argument must be explicitly checked!

### System Call Continued

How to get from Kernel→User

- What are some system calls?
  - I/O: open, close, read, write, lseek
  - Files: delete, mkdir, rmdir, truncate, chown, chgrp, ...
  - Process: fork, exit, wait (like join)
  - Network: socket create, set options
- Are system calls constant across operating systems?
  - Not entirely, but there are lots of commonalities
  - Also some standardization attempts (POSIX)
- What happens at beginning of system call? » On entry to kernel, sets system to kernel mode » Handler address fetched from table/Handler started
- System Call argument passing:
  - In registers (not very much can be passed)
  - Write into user memory, kernel copies into kernel mem » User addresses must be translated!w

| User→Kernel (Exceptions: Traps and Interrupts)                                                                                      | Additions to MIPS ISA to support Exceptions?                                                                   |
|-------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------|
| • A system call instruction causes a synchronous                                                                                    | • Exception state is kept in "Coprocessor O"                                                                   |
| exception (or "trap")                                                                                                               | - Use mfc0 read contents of these registers:                                                                   |
| <ul> <li>In fact, often called a software "trap" instruction</li> <li>Other sources of Synchronous Exceptions ("Trap"):</li> </ul>  | » BadVAddr (register 8): contains memory address at which<br>memory reference error occurred                   |
| - Divide by zero, Illegal instruction, Bus error (bad                                                                               | memory reference error occurred » Status (register 12): interrupt mask and enable bits                         |
| address, e.g. unaligned access)                                                                                                     | » Cause (register 12): the cause of the exception                                                              |
| - Segmentation Fault (address out of range)                                                                                         | » EPC (register 14): address of the affected instruction                                                       |
| <ul> <li>Page Fault (for illusion of infinite-sized memory)</li> </ul>                                                              | 15 8 5 4 3 2 1 0                                                                                               |
| <ul> <li>Interrupts are Asynchronous Exceptions</li> </ul>                                                                          | Status Mask keke                                                                                               |
| <ul> <li>Examples: timer, disk ready, network, etc</li> </ul>                                                                       | old prev cur                                                                                                   |
| - Interrupts can be disabled, traps cannot!                                                                                         | <ul> <li>Status Register fields:</li> </ul>                                                                    |
| <ul> <li>On system call, exception, or interrupt:</li> </ul>                                                                        | - Mask: Interrupt enable                                                                                       |
| - Hardware enters kernel mode with interrupts disabled                                                                              | » 1 bit for each of 5 hardware and 3 software interrupts                                                       |
| - Saves PC, then jumps to appropriate handler in kernel                                                                             | - k = kernel/user: O⇒kernel mode                                                                               |
| <ul> <li>For some processors (x86), processor also saves<br/>registers, changes stack, etc.</li> </ul>                              | - e = interrupt enable: 0⇒interrupts disabled                                                                  |
| <ul> <li>Actual handler typically saves registers, other CPU</li> </ul>                                                             | - Exception⇒6 LSB shifted left 2 bits, setting 2 LSB to 0                                                      |
| state, and switches to kernel stack                                                                                                 | » run in kernel mode with interrupts disabled                                                                  |
| 10/13/10         Kubiatowicz CS162 ©UCB Fall 2010         Lec 13.13                                                                 | 10/13/10 Kubiatowicz CS162 ©UCB Fall 2010 Lec 13.1                                                             |
| Closing thought: Protection without Hardware                                                                                        | Administrivia                                                                                                  |
| <ul> <li>Does protection require hardware support for<br/>translation and dual-mode behavior?</li> </ul>                            | • Midterm next Monday:                                                                                         |
|                                                                                                                                     | - Monday, 10/19, 6:00-9:00pm, 155 Dwinelle                                                                     |
| <ul> <li>No: Normally use hardware, but anything you can do in<br/>hardware can also do in software (possibly expensive)</li> </ul> | - Should be 2 hour exam with extra time                                                                        |
| <ul> <li>Protection via Strong Typing</li> </ul>                                                                                    | <ul> <li>Closed book, one page of hand-written notes (both sides)</li> </ul>                                   |
| - Restrict programming language so that you can't express                                                                           | <ul> <li>No class on day of Midterm</li> </ul>                                                                 |
| program that would trash another program                                                                                            | - Different Office Hours for me: Mon 11:00-12:30.                                                              |
| <ul> <li>Loader needs to make sure that program produced by<br/>valid compiler or all bets are off</li> </ul>                       | - Office hours during Class time: 4:00-5:30                                                                    |
| - Example languages: LISP, Ada, Modula-3 and Java                                                                                   | - Unfortunately, 2:30-3:30 may be taken. Stay tuned.                                                           |
| <ul> <li>Protection via software fault isolation:</li> </ul>                                                                        | • Midterm Topics                                                                                               |
| - Language independent approach: have compiler generate                                                                             | - Topics: Everything up to Today                                                                               |
| object code that provably can't step out of bounds                                                                                  | <ul> <li>History, Concurrency, Multithreading, Synchronization,<br/>Protection/Address Spaces, TLBs</li> </ul> |
| » Compiler puts in checks for every "dangerous" operation<br>(loads, stores, etc). Again, need special loader.                      | <ul> <li>Make sure to fill out Group Evaluations!</li> </ul>                                                   |
| » Alternative, compiler generates "proof" that code cannot                                                                          | · Project 2                                                                                                    |
| do certain things (Proof Carrying Code)                                                                                             | - Initial Design Document due Friday 10/15                                                                     |
| <ul> <li>Or: use virtual machine to guarantee safe behavior<br/>(loads and stores recompiled on fly to check bounds)</li> </ul>     | - Look at the lecture schedule to keep up with due dates!                                                      |
| 10/13/10 Kubiatowicz CS162 ©UCB Fall 2010 Lec 13.15                                                                                 | 10/13/10 Kubiatowicz CS162 ©UCB Fall 2010 Lec 13.1                                                             |
| 20/20/20 Rabia towicz CO20E 90CD Fail 2020 Let 15.15                                                                                |                                                                                                                |

## Why Bother with Caching?



Caching Concept

## Memory Hierarchy of a Modern Computer System

- Take advantage of the principle of locality to:
  - Present as much memory as in the cheapest technology
  - Provide access at speed offered by the fastest technology



# How is a Block found in a Cache?



Data Select

- Index Used to Lookup Candidates in Cache
  - Index identifies the set
- Tag used to identify actual copy
  - If no candidates match, then declare cache miss
- Block is minimum quantum of caching
  - Data select field used to select data within block
  - Many caching applications don't have data select field

# A Summary on Sources of Cache Misses

• Compulsory (cold start or process migration, first reference): first access to a block - "Cold" fact of life: not a whole lot you can do about it - Note: If you are going to run "billions" of instruction, Compulsory Misses are insignificant • Capacity: - Cache cannot contain all blocks access by the program - Solution: increase cache size • Conflict (collision): - Multiple memory locations mapped to the same cache location - Solution 1: increase cache size - Solution 2: increase associativity • Coherence (Invalidation): other process (e.g., I/O) updates memory 10/13/10 Kubiatowicz CS162 ©UCB Fall 2010 Lec 13.22 **Review:** Direct Mapped Cache Direct Mapped 2<sup>N</sup> byte cache: - The uppermost (32 - N) bits are always the Cache Tag - The lowest M bits are the Byte Select (Block Size =  $2^{M}$ ) • Example: 1 KB Direct Mapped Cache with 32 B Blocks - Index chooses potential block - Tag checked to verify block - Byte select chooses byte within block 31 0 **Byte Select** Cache Tag **Cache Index** Ex: 0x01 Ex: 0x50 Ex: 0x00 Valid Bit Cache Tag **Cache Data** Byte 31 • Byte 1 Byte 0 0 Byte 63 •• Byte 33 Byte 32 0x50 3 • : •• Byte 992 31 Byte 1023 10/13/10 Kubiatowicz CS162 ©UCB Fall 2010 Lec 13,24

10/13/10

Lec 13.23

### **Review: Set Associative Cache**

- N-way set associative: N entries per Cache Index
   N direct mapped caches operates in parallel
- Example: Two-way set associative cache
  - Cache Index selects a "set" from the cache
  - Two tags in the set are compared to input in parallel
  - Data is selected based on the tag result



## Where does a Block Get Placed in a Cache?



### **Review: Fully Associative Cache**

- Fully Associative: Every block can hold any line
  - Address does not include a cache index
  - Compare Cache Tags of all Cache Entries in Parallel
- Example: Block Size=32B blocks
  - We need N 27-bit comparators
  - Still have byte select to choose from within block



10/13/10

Kubiatowicz CS162 ©UCB Fall 2010

Lec 13.26

## Review: Which block should be replaced on a miss?

- Easy for Direct Mapped: Only one possibility
- Set Associative or Fully Associative:
  - Random
  - LRU (Least Recently Used)

|             | 2-way<br>LRU Random |        | 4-    | way    | 8-way |        |  |
|-------------|---------------------|--------|-------|--------|-------|--------|--|
| <u>Size</u> | LRU                 | Random | LRU   | Random | LRU   | Random |  |
| 16 KB       | 5.2%                | 5.7%   | 4.7%  | 5.3%   | 4.4%  | 5.0%   |  |
| 64 KB       | 1.9%                | 2.0%   | 1.5%  | 1.7%   | 1.4%  | 1.5%   |  |
| 256 KB      | 1.15%               | 1.17%  | 1.13% | 1.13%  | 1.12% | 1.12%  |  |

10/13/10

### Review: What happens on a write?

- Write through: The information is written to both the block in the cache and to the block in the lower-level memory
- Write back: The information is written only to the block in the cache.
  - Modified cache block is written to main memory only when it is replaced
  - Question is block clean or dirty?
- Pros and Cons of each?
  - WT:
    - » PRO: read misses cannot result in writes
    - » CON: Processor held up on writes unless writes buffered
  - WB:
    - » PRO: repeated writes not sent to DRAM processor not held up on writes
    - » CON: More complex Read miss may require writeback of dirty data

| 10/12/1 |   |
|---------|---|
|         | n |
| 10/13/1 | v |

Kubiatowicz CS162 ©UCB Fall 2010

Lec 13.29

# Caching Applied to Address Translation



- Data accesses have less page locality, but still some...
- Can we have a TLB hierarchy?
  - Sure: multiple levels at different sizes/speeds

10/13/10

Kubiatowicz CS162 ©UCB Fall 2010

Lec 13.30

# What Actually Happens on a TLB Miss?

### • Hardware traversed page tables:

- On TLB miss, hardware in MMU looks at current page table to fill TLB (may walk multiple levels)
  - » If PTE valid, hardware fills TLB and processor never knows
  - » If PTE marked as invalid, causes Page Fault, after which kernel decides what to do afterwards
- Software traversed Page tables (like MIPS)
  - On TLB miss, processor receives TLB fault
  - Kernel traverses page table to find PTE
    - » If PTE valid, fills TLB and returns from fault
    - » If PTE marked as invalid, internally calls Page Fault handler
- Most chip sets provide hardware traversal
  - Modern operating systems tend to have more TLB faults since they use translation for many things
  - Examples:
    - » shared segments
    - » user-level portions of an operating system

## What happens on a Context Switch?

- Need to do something, since TLBs map virtual addresses to physical addresses
  - Address Space just changed, so TLB entries no longer valid!
- Options?
  - Invalidate TLB: simple but might be expensive
    - » What if switching frequently between processes?
  - Include ProcessID in TLB
    - » This is an architectural solution: needs hardware
- What if translation tables change?
  - For example, to move page from memory to disk or vice versa...
  - Must invalidate TLB entry!
    - » Otherwise, might think that page is still in memory!

10/13/10



# TLB organization: include protection

- How big does TLB actually have to be?
  - Usually small: 128-512 entries
  - Not very big, can support higher associativity
- TLB usually organized as fully-associative cache
  - Lookup is by Virtual Address
  - Returns Physical Address + other info
- What happens when fully-associative is too slow?
  - Put a small (4-16 entry) direct-mapped cache in front
  - Called a "TLB Slice"
- Example for MIPS R3000:

| Virtual Address | Physical Address | Dirty | Ref | Valid | Access | ASID |
|-----------------|------------------|-------|-----|-------|--------|------|
| 0xFA00          | 0x0003           | Y     | Ν   | Y     | R/W    | 34   |
| 0x0040          | 0x0010           | Ν     | Y   | Y     | R      | 0    |
| 0x0041          | 0x0011           | Ν     | Y   | Y     | R      | 0    |

```
10/13/10
```

Kubiatowicz CS162 ©UCB Fall 2010

Lec 13.34

# Example: R3000 pipeline includes TLB "stages"

#### MIPS R3000 Pipeline

| Inst Fetch |       | Dcd/ | / Reg | ALU       | / E.A | Memory  | Write Reg |
|------------|-------|------|-------|-----------|-------|---------|-----------|
| TLB        | I-Cac | he   | RF    | Operation |       |         | WB        |
|            |       |      | -     | E.A.      | TLB   | D-Cache |           |

#### TLB

64 entry, on-chip, fully associative, software TLB fault handler

#### Virtual Address Space



## Reducing translation time further

• As described, TLB lookup is in serial with cache lookup:



### **Physical Address**

- Machines with TLBs go one step further: they overlap TLB lookup with cache access.
  - Works because offset available early

Lec 13,35



- $\cdot$  On TLB miss, page table must be traversed
  - If located PTE is invalid, cause Page Fault
- $\cdot$  On context switch/change in page table
  - TLB entries must be invalidated somehow
- $\cdot$  TLB is logically in front of cache
  - Thus, needs to be overlapped with cache access to be really fast