# CS162 Operating Systems and Systems Programming Lecture 10

#### **Caches and TLBs**

February 28, 2013
Anthony D. Joseph
http://inst.eecs.berkeley.edu/~cs162

## **Goals for Today's Lecture**

- Caching
  - Misses
  - Organization
- Translation Look aside Buffers (TLBs)
- How caching and TLBs fit into the virtual memory architecture

Note: Some slides and/or pictures in the following are adapted from slides ©2005 Silberschatz, Galvin, and Gagne. Slides courtesy of Anthony D. Joseph, John Kubiatowicz, AJ Shankar, George Necula, Alex Aiken, Eric Brewer, Ras Bodik, Ion Stoica, Doug Tygar, and David Wagner.

/28/13

Anthony D. Joseph CS162 ©UCB Spring 2013

Lec 10.2

## **Caching Concept**



- Cache: a repository for copies that can be accessed more quickly than the original
  - Make frequent case fast and infrequent case less dominant
- · Caching at different levels
  - Can cache: memory locations, address translations, pages, file blocks, file names, network routes, etc...
- Only good if:
  - Frequent case frequent enough and
  - Infrequent case not too expensive
- Important measure: Average Access time =
   (Hit Rate x Hit Time) + (Miss Rate x Miss Time)

2/28/13

Anthony D. Joseph CS162 ©UCB Spring 2013

Lec 10.3

#### **Example** · Data in memory, no cache: Main Memory Processor (DRAM) Access time = 100ns 100ns Data in memory, 10ns cache: Second Main Level Memory Processor Cache (DRAM) (SRAM) Average Access time = (Hit Rate x HitTime) + (Miss Rate x MissTime) HitRate + MissRate = 1 2/28/13 Anthony D. Joseph CS162 ©UCB Spring 2013 Lec 10.4





























## Which block should be replaced on a miss?

- · Easy for Direct Mapped: Only one possibility
- · Set Associative or Fully Associative:
  - Random
  - LRU (Least Recently Used)

| Size   | 2-way<br>LRU Random                    |       | 4-way<br>n LRU Random |       | 8-way<br>LRU Random |          |
|--------|----------------------------------------|-------|-----------------------|-------|---------------------|----------|
| 16 KB  | 5.2%                                   | 5.7%  | 4.7%                  | 5.3%  | 4.4%                | 5.0%     |
| 64 KB  | 1.9%                                   | 2.0%  | 1.5%                  | 1.7%  | 1.4%                | 1.5%     |
| 256 KB | 1.15%                                  | 1.17% | 1.13%                 | 1.13% | 1.12%               | 1.12%    |
|        |                                        |       |                       |       |                     |          |
|        |                                        |       |                       |       |                     |          |
|        |                                        |       |                       |       |                     |          |
| 3/13   | Anthony D. Joseph CS162 ©UCB Spring 20 |       |                       |       | 013                 | Lec 10.1 |

## What happens on a write?

- Write through: The information is written both to the block in the cache and to the block in the lower-level memory
- Write back: The information is written only to the block in the cache.
  - Modified cache block is written to main memory only when it is replaced
  - Question is block clean or dirty?
- · Pros and Cons of each?
  - -WT:
    - » PRO: read misses cannot result in writes
    - » CON: processor held up on writes unless writes buffered
  - WB:
    - » PRO: repeated writes not sent to DRAM processor not held up on writes
    - » CON: More complex
      - Read miss may require writeback of dirty data

2/28/13 Anthony D. Joseph CS162 ©UCB Spring 2013

#### **Administrivia**

- Midterm exam is Wednesday 3/13 4-5:30pm in two rooms
  - 145 Dwinelle for last names beginning with A-H
  - 245 Li Ka Shing for last names beginning with I-Z
- · Midterm is closed book, no calculators
  - Covers up to Lecture #12 (Kernel/User and I/O)
  - One double-sided handwritten page of notes allowed
- Midterm review session TBA
- Project 1 design doc (submit proj1-final-design) and group evals (Google Docs form) due today by 11:59PM
  - Group evals are anonymous to your group
- Class feedback is always welcome!
  2/28/13 Anthony D. Joseph CS162 @UCB Spring 2013







## **What Actually Happens on a TLB Miss?**

- · Hardware traversed page tables:
  - On TLB miss, hardware in MMU looks at current page table to fill TLB (may walk multiple levels)
    - » If PTE valid, hardware fills TLB and processor never knows
    - » If PTE marked as invalid, causes Page Fault, after which kernel decides what to do afterwards
- · Software traversed Page tables
  - On TLB miss, processor receives TLB fault
  - Kernel traverses page table to find PTE
    - » If PTE valid, fills TLB and returns from fault
    - » If PTE marked as invalid, internally calls Page Fault handler
- · Most chip sets provide hardware traversal
  - Modern operating systems tend to have more TLB faults since they use translation for many things

2/28/13

Anthony D. Joseph CS162 ©UCB Spring 2013

Lec 10.25

### What happens on a Context Switch?

- Need to do something, since TLBs map virtual addresses to physical addresses
  - Address Space just changed, so TLB entries no longer valid!
- Options?
  - Invalidate TLB: simple but might be expensive
    - » What if switching frequently between processes?
  - Include ProcessID in TLB
    - » This is an architectural solution: needs hardware
- What if translation tables change?
  - For example, to move page from memory to disk or vice versa...
  - Must invalidate TLB entry!
    - » Otherwise, might think that page is still in memory!

2/28/13

Anthony D. Joseph CS162 ©UCB Spring 2013

Lec 10.26

## What TLB organization makes sense?



- Needs to be really fast
  - Critical path of memory access
  - Seems to argue for Direct Mapped or Low Associativity
- However, needs to have very few conflicts!
  - With TLB, the Miss Time extremely high!
  - This argues that cost of Conflict (Miss Time) is much higher than slightly increased cost of access (Hit Time)
- Thrashing: continuous conflicts between accesses
  - What if use low order bits of page as index into TLB?
    - » First page of code, data, stack may map to same entry
    - » Need 3-way associativity at least?
  - What if use high order bits as index?
    - » TLB mostly unused for small programs

2/28/13 Anthony D. Joseph CS162 ©UCB Spring 2013

Lec 10.27

## **TLB organization: include protection**

- · How big does TLB actually have to be?
  - -Usually small: 128-512 entries
  - -Not very big, can support higher associativity
- TLB usually organized as fully-associative cache
  - -Lookup is by Virtual Address
  - -Returns Physical Address + other info
- What happens when fully-associative is too slow?
  - -Put a small (4-16 entry) direct-mapped cache in front
  - -Called a "TLB Slice"
- When does TLB lookup occur relative to memory cache access?
  - -Before memory cache lookup?
  - -In parallel with memory cache lookup?

2/28/13

Anthony D. Joseph CS162 ©UCB Spring 2013













## **Summary (1/2)**

- The Principle of Locality:
  - Program likely to access a relatively small portion of the address space at any instant of time.
    - » Temporal Locality: Locality in Time
    - » Spatial Locality: Locality in Space
- Three (+1) Major Categories of Cache Misses:
  - Compulsory Misses: sad facts of life. Example: cold start misses.
  - Conflict Misses: increase cache size and/or associativity
  - Capacity Misses: increase cache size
  - Coherence Misses: Caused by external processors or I/O devices

2/28/13

Anthony D. Joseph CS162 ©UCB Spring 2013

Lec 10.35

## **Summary (2/2)**

- Cache Organizations:
  - Direct Mapped: single block per set
  - Set associative: more than one block per set
  - Fully associative: all entries equivalent
- TLB is cache on address translations
  - Fully associative to reduce conflicts
  - Can be overlapped with cache access

2/28/13

Anthony D. Joseph CS162 ©UCB Spring 2013