





#### Administrivia (1/2) **Review: Fully Associative Cache** • Fully Associative: Every block can hold any line • Happy $\pi$ Day!!! - Address does not include a cache index - Compare Cache Tags of all Cache Entries in Parallel - 40 digits are sufficient to calculate circumference of visible universe to atomic dimensions: Example: Block Size=32B blocks - We need N 27-bit comparators See: https://www.jpl.nasa.gov/edu/news/2016/3/16/how-many-decimals-of-pi-do-- Still have byte select to choose from within block we-really-need/ Cache Tag (27 bits long) Byte Select - Here are 40 decimal places: Ex: 0x01 3.1415926535897932384626433832795028841971 Cache Tag Valid Bit Cache Data · Best formula for PI is from Ramanujan: Byte 31 · · Byte 1 Byte 0 ·(=) $-\frac{1}{\pi} = \frac{2\sqrt{2}}{9801} \sum_{k=0}^{\infty} \frac{(4k)!(1103+26390k)}{(k!)^4 396^{4k}}$ **\_**) Byte 63 · · · Byte 33 Byte 32 - Google announced back in 2019 (3/14/19) that Emma Haruka Iwao had just calculated pi to 31,415,926,535,897 digits (new record...) : : : 3/14/23 Kubiatowicz CS162 © UCB Spring 2023 Lec 16.13 3/14/23 Kubiatowicz CS162 © UCB Spring 2023 Lec 16.14

## Administrivia (2/2)

- Midterm 2: TOMORROW!
  - 8pm-10pm, 150 Wheeler Hall
  - You are responsible material up to and including today's lecture (specifically, caching and basic idea of TLBs).
  - Two sheets of notes: handwritten, double-sided
- Busy working on Project 2 and Homework 4!
- Make sure to fill out the survey!
  - We want to know how we are doing
  - Also, get to consider topics for optional lecture at end of the term...

### Where does a Block Get Placed in a Cache?



Lec 16.16

|         | <ul> <li>Which block should be replaced on a miss?</li> <li>Easy for Direct Mapped: Only one possibility</li> <li>Set Associative or Fully Associative: <ul> <li>Random</li> <li>LRU (Least Recently Used)</li> </ul> </li> <li>Miss rates for a workload: <ul> <li>2-way</li> <li>4-way</li> <li>8-way</li> </ul> </li> <li>Size LRU Random LRU Random LRU Random</li> <li>16 KB 5.2% 5.7% 4.7% 5.3% 4.4% 5.0%</li> <li>64 KB 1.9% 2.0% 1.5% 1.7% 1.4% 1.5%</li> <li>256 KB1.15% 1.17% 1.13% 1.13% 1.12% 1.12%</li> </ul> |           | <ul> <li>Review: What happens on a write?</li> <li>Write through: The information is written to both the block in the cand to the block in the lower-level memory</li> <li>Write back: The information is written only to the block in the cach <ul> <li>Modified cache block is written to main memory only when it is replaced</li> <li>Question is block clean or dirty?</li> </ul> </li> <li>Pros and Cons of each? <ul> <li>WT:</li> <li>PRO: read misses cannot result in writes</li> <li>CON: Processor held up on writes unless writes buffered</li> <li>WB:</li> <li>PRO: repeated writes not sent to DRAM processor not held up on writes</li> <li>CON: More complex Read miss may require writeback of dirty data</li> </ul> </li> </ul> | e         |
|---------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|
| 3/14/23 | Kubiatowicz CS162 © UCB Spring 2023                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | Lec 16.17 | 3/14/23 Kubiatowicz CS162 © UCB Spring 2023                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | Lec 16.18 |

### A Summary on Sources of Cache Misses

- Compulsory (cold start or process migration, first reference): first access to a block
  - "Cold" fact of life: not a whole lot you can do about it unless you prefetch
  - Solution: Prefetch values before use
  - Note: If you run "billions" of instruction, Compulsory Misses are insignificant
- · Capacity:
  - Cache cannot contain all blocks access by the program
  - Solution 1: increase cache size
  - "Solution 2": Change (e.g. reduce) associativity to focus misses in a few places?!
     » Consider fully-associative cache of size n: access pattern 0, 1, ... n-1, n, 0, 1, ...
    - » Contrast with direct mapped of size n
- Conflict (collision):
  - Multiple memory locations mapped to the same cache location
  - Solution 1: increase cache size
  - Solution 2: increase associativity
- Coherence (Invalidation): other process (e.g., I/O) updates memory

# How do we make Address Translation Fast?

- Cache results of recent translations !
  - Different from a traditional cache
  - Cache Page Table Entries using Virtual Page # as the key



Lec 16.19





- Machines with TLBs go one step further: overlap TLB lookup with cache access
  - Works because offset available early

3/14/23

- Offset in virtual address exactly covers the "cache index" and "byte select"
- Thus can select the cached byte(s) in parallel to perform address translation



**Physical Address** 

P page no.

PA

Rights

3/14/23

offset

-10----



- What if cache size is increased to 8KB?
  - Overlap not complete
  - Need to do something else. See CS152/252
- · As discussed earlier, Virtual Caches would make this faster

Kubiatowicz CS162 © UCB Spring 2023

- Tags in cache are virtual addresses
- Translation only happens on cache misses





### What happens on a Context Switch?

- · Need to do something, since TLBs map virtual addresses to physical addresses
  - Address Space just changed, so TLB entries no longer valid!
- Options?
  - Invalidate ("Flush") TLB: simple but might be expensive
    - » What if switching frequently between processes?
  - Include ProcessID in TLB
    - » This is an architectural solution: needs hardware
- What if translation tables change?
  - For example, to move page from memory to disk or vice versa...
  - Must invalidate TLB entry!
     » Otherwise, might think that page is still in memory!
  - Called "TLB Consistency"
- Aside: with Virtually-Indexed, Virtually-Tagged cache, need to flush cache!
  - Everyone has their own version of the address "0" and can't distinguish them
  - This is one advantage of Virtually-Indexed, Physically-Tagged caches..

# Putting Everything Together: Address Translation



Lec 16.35





## **Page Fault Handling**

- The Virtual-to-Physical Translation fails
  - PTE marked invalid, Privilege Level Violation, Access violation, or does not exist
  - Causes an Fault / Trap
    - » Not an interrupt because synchronous to instruction execution
  - May occur on instruction fetch or data access
  - Protection violations typically terminate the process
- Other Page Faults engage operating system to fix the situation and retry the instruction
  - Allocate an additional stack page, or
  - Make the page accessible (Copy on Write),
  - Bring page in from secondary storage to memory demand paging
- · Fundamental inversion of the hardware / software boundary
  - Need to execute software to allow hardware to proceed!

# **Demand Paging**

- Modern programs require a lot of physical memory – Memory per system growing faster than 25%-30%/year
- But they don't use all their memory all of the time
  - -90-10 rule: programs spend 90% of their time in 10% of their code
  - -Wasteful to require all of user's code to be in memory
- Solution: use main memory as "cache" for disk



Lec 16.39



Lec 16.43