## Quick Review: Caches

Your Uber Nintendo console has 64 KiB of physical memory and a 512B direct mapped cache with 16B blocks. How big are the $\mathrm{T}, \mathrm{I}$, and O fields?
7/5/4

The other way now, suppose you know a physical address splits into the following fields:
Find the parameters listed above assuming a direct-mapped cache. $128 \mathrm{KiB}, 4 \mathrm{KiB}$ cache, 32 B blocks.

| TTTTT | IIIIIII | 00000 |
| :---: | :---: | :---: |
| 5 bits | 7 bits | 5 bits |

## Average Memory Access Time

AMAT $=$ Hit Time + Miss Rate * Miss Penalty

## The Three Cs

Compulsory Miss -
First time accessing data (e.g. startup)
Conflict Miss -
Two blocks map to the same cache line Capacity Miss -


Just not enough room in the cache

## Cache Knobs

| Knob | Values | Effects |
| :--- | :--- | :--- |
| Block Size | Powers of 2 | +More hits due to spacial <br> locality. <br> -More expensive memory <br> access (transferring more at <br> once). <br> -For fixed cache size, fewer <br> indices, higher miss rate. |
| Associativity | Direct Mapped <br> N-Way <br> Fully Associative | More associativity - Fewer <br> conflict misses, but more <br> hardware in comparators. |
| Cache Size | Powers of 2 | More hardware, but can crank <br> up other parameters to reduce <br> miss rate. |
| Write Policy | Write Back <br> Write Through | WB - Faster overall, don't need <br> to write memory every time. <br> WT - More consistent <br> performance. Won't have to <br> suddenly update memory with <br> many dirty blocks. Simpler to <br> use in complicated <br> architectures. |
| Replacement <br> Policy | LRU <br> Random <br> Other | LRU - Great performance, hard <br> to keep track of. <br> Random - Simple, worse <br> performance. |
| RAM Size | Powers of 2 | More hardware, fewer page <br> faults, less thrashing. |

RAM

| 0x00c | $0 \times 008$ | $0 \times 004$ | $0 \times 000$ |
| :---: | :---: | :---: | :---: |
| $0 \times 01 c$ | $0 \times 018$ | $0 \times 014$ | $0 \times 010$ |
| $0 \times 02 c$ | $0 \times 028$ | $0 \times 024$ | $0 \times 020$ |
| $0 \times 03 c$ | $0 \times 038$ | $0 \times 034$ | $0 \times 030$ |
| $\ldots$ | $\ldots$ | $\ldots$ | $\ldots$ |
|  |  |  |  |

## Cache Exercises

1) Assuming 32 bits of physical memory, how many total bits of storage are required for an 8 -way set associative 4 KiB cache, if blocks are 16 B and it uses write back and LRU replacement?
$T=23, I=5, O=4.2^{\wedge} 8$ rows, each row has Tag, Data, Valid, Dirty, log_2(Associativity) (for LRU) $=23+128$ $+1+1+3=156$ bits. $156 * 2^{\wedge} 8=39936$ bits.
2) You want your AMAT to be $<=2$ cycles. You have two levels of cache.
L1 hit time is 1 cycle.
L1 miss rate is $10 \%$
L2 hit time is 3 cycles
L2 miss penalty is 100 cycles

What does your L 2 miss rate need to be?
AMAT $=$ Hit rate + L1 Miss rate* $($ L2 Hit rate + L2 Miss rate*(L2 Miss penalty $))$
$=1+.1(3+100 x)$.
$2>=1.3+100 x$.
$\mathrm{x}<=.7 \%$
3) Look at final questions F2 from Fa05 and F2 from Fa06. (on your own time)

## Virtual Memory

What are three specific benefits of using virtual memory? [there are many]
Bridges memory and disk in memory hierarchy.
Simulates full address space for each process.
Enforces protection between processes.

## Virtual to Physical Address Translation

## Page Table

Split memory into a bunch of equal sized chunks called pages.
Better still, split a section of disk into these chunks as well. Map
 any virtual page into a physical page via table look up -- the page table. Each process gets its own page table.

## Translation Lookaside Buffer (TLB)

A cache of page table entries. Each block is a single page table entry. If an entry (corresponding to a virtual page) is not in the TLB, it's a TLB miss. If that entry is also not in memory (it's been paged out), it's a page fault. Who gets kicked out on a page fault? What happens if there just isn't enough physical memory? Whoever is selected according to the replacement policy. Assuming the OS knows where to find the page table on disk, we can get by with surprisingly few pages of physical memory, albeit slowly.

The TLB and the Page Table together make up a translation unit that maps from virtual addresses to physical addresses.


Figure 1: Logical Flow 'Big Picture"


Figure 2: Physical Implementation 'Big Picture"

## Virtual Memory Exercises

1) Your ISA supports a 36 b address space and your 2 GiB RAM is broken into 2 KiB pages. How many bits are in a VPN? PPN? If all the flags (valid, dirty, etc) take up 12 bits per entry, how many bits does an entire page table take up? 31 bits physical memory, 20 bit PPN, 25 bit VPN. $2^{\wedge} 25$ rows in page table, each row has 12 (flags) $+20(\mathrm{PPN})$ bits, $2^{\wedge} 30$ bits $=2^{\wedge} 27$ bytes total. Note that this is quite a lot of space due to the large VPN for systems like this, more clever page table schemes are used (see cs162).
2) Fill out this table!

| Virtual Ad- <br> dress Bits | Physical Ad- <br> Dress Bits | Page Size | VPN Bits | PPN Bits | Bits per row of <br> PT (4 extra bits) |
| :--- | :--- | :--- | :--- | :--- | :--- |
| 32 | 32 | 16 KB | 18 | 18 | 22 |
| 32 | 26 | 8 KB | 19 | 13 | 17 |
| 36 | 32 | 32 KB | 21 | 17 | 21 |
| 40 | 36 | 32 KB | 25 | 21 | 25 |
| 64 | 40 | 64 KB | 48 | 24 | 28 |

3) A processor has 16-bit addresses, 256 byte pages, and an 8-entry fully associative TLB with LRU replacement (the LRU field is 3 bits and encodes the order in which pages were accessed, 0 being the most recent). At some time instant, the TLB for the current process is in the initial state given in the table below. Assume that all current page table entries are in the initial TLB. Fill in the final state of the TLB according to the access pattern below.

Free physical pages: $0 \times 17,0 \times 18,0 \times 19$
Access pattern: 0x11f0 (Read), 0x1301 (Write), 0x20ae (Write), 0x2332 (Write), 0x20ff (Read), 0x3415 (Write)

| VPN | PPN | Valid | Dirty | LRU | Access |
| :---: | :---: | :---: | :---: | :---: | :---: |
| 0x01 | 0x11 | 1 | 1 | 0 | RW |
| 0x00 | 0x00 | 0 | 0 | 7 | RW |
| 0x10 | 0x13 | 1 | 1 | 1 | RW |
| 0x20 | 0x12 | 1 | 0 | 5 | RW |
| 0x00 | $0 \times 00$ | 0 | 0 | 7 | RW |
| 0x11 | 0x14 | 1 | 0 | 4 | RW |
| 0xae | 0x15 | 1 | 1 | 2 | RW |
| 0xff | 0x16 | 1 | 0 | 3 | RW |

For each access:
hit, LRUs: 1,7,2,5,7,0,3,4
miss w/o replacement, map VPN 0x13 to PPN 0x17, Valid/Dirty bits to 1. LRUs: 2,0,3,6,7,1,4,5
hit, LRUs: 3,1,4,0,7,2,5,6
miss w/o replacement, map VPN 0x23 to PPN 0x18, V/D bits to 1, LRUs: 4,2,5,1,0,3,6,7
hit, LRUs: 4,2,5,0,1,3,6,7
miss w/replacement, replace last element, map VPN 0x34 to PPN 0x19, D bit to 1, LRUs: 5,3,6,2,1,4,7,0

CS 61C Spring 2010
Section 114/8

TA: Michael cs61c-tf@inst.eecs.berkeley.edu

Final

| VPN | PPN | Valid | Dirty | LRU | Access |
| :---: | :---: | :---: | :---: | :---: | :---: |
| 0x01 | 0x11 | $\mathbf{1}$ | $\mathbf{1}$ | $\mathbf{5}$ | RW |
| 0x13 | 0x17 | $\mathbf{1}$ | $\mathbf{1}$ | $\mathbf{3}$ | RW |
| 0x10 | 0x13 | $\mathbf{1}$ | $\mathbf{1}$ | $\mathbf{6}$ | RW |
| 0x20 | 0x12 | $\mathbf{1}$ | $\mathbf{1}$ | $\mathbf{2}$ | RW |
| 0x23 | 0x18 | $\mathbf{1}$ | $\mathbf{1}$ | $\mathbf{1}$ | RW |
| 0x11 | 0x14 | $\mathbf{1}$ | $\mathbf{0}$ | $\mathbf{4}$ | RW |
| 0xae | 0x15 | $\mathbf{1}$ | $\mathbf{1}$ | $\mathbf{7}$ | RW |
| 0x34 | 0x19 | $\mathbf{1}$ | $\mathbf{1}$ | $\mathbf{0}$ | RW |

