

Univ of NSW says that "working memory", the brain part providing temporary storage, is limited (3-4 things for 20 sec unless rehearsal), and saying what is on slides splits attention, & bad.



what is on slides splits attention, & bad.

slashdot.org/article.pl?sid=07/04/04/1319247

csstc1rcless(1)-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc1/1-csstc

# **Review: Pipelining**

- Pipeline challenge is hazards
  - · Forwarding helps w/many data hazards
  - Delayed branch helps with control hazard in our 5 stage pipeline
  - Data hazards w/Loads ⇒ Load Delay Slot
- More aggressive performance (discussed in section next week)
  - Superscalar (parallelism)
  - Out-of-order execution



Sarcia, Spring 2007 © U





# **Memory Hierarchy**

Storage in computer systems:

- Processor
  - · holds data in register file (~100 Bytes)
  - · Registers accessed on nanosecond timescale
- Memory (we'll call "main memory")
  - More capacity than registers (~Gbytes)
  - · Access time ~50-100 ns
  - Hundreds of clock cycles per memory access?!
- Disk
  - · HUGE capacity (virtually limitless)
  - · VERY slow: runs ~milliseconds

CS61C L31 Caches I (5)

Garcia, Spring 2007 © UCI

# Motivation: Why We Use Caches (written \$) 1000 Processor-Memory Performance Gap: (grows 50% / year) DRAM 7%/yr. 1989 first Intel CPU with cache on chip 1998 Pentium III has two levels of cache on chip

Garcia, Spring 2007 ®

# **Memory Caching**

- Mismatch between processor and memory speeds leads us to add a new level: a memory cache
- Implemented with same IC processing technology as the CPU (usually integrated on same chip): faster but more expensive than DRAM memory.
- Cache is a copy of a subset of main memory.
- Most processors have separate caches for instructions and data.



Caches I (7) Garcia, Spring



# **Memory Hierarchy**

- If level closer to Processor, it is:
  - smaller
  - · faster
  - subset of lower levels (contains most recently used data)
- Lowest Level (usually disk) contains all available data (or does it go beyond the disk?)
- Memory Hierarchy presents the processor with the illusion of a very large very fast memory.



Garcia, Spring 2007 © UC

# **Memory Hierarchy Analogy: Library (1/2)**

- You're writing a term paper (Processor) at a table in Doe
- Doe Library is equivalent to disk
  - · essentially limitless capacity
  - · very slow to retrieve a book
- Table is main memory
  - smaller capacity: means you must return book when table fills up
  - easier and faster to find a book there once you've already retrieved it



CS61C L31 Caches I (10)

Sarcia Spring 2007 © II

# **Memory Hierarchy Analogy: Library (2/2)**

- · Open books on table are cache
  - smaller capacity: can have very few open books fit on table; again, when table fills up, you must close a book
  - · much, much faster to retrieve data
- Illusion created: whole library open on the tabletop
  - Keep as many recently used books open on table as possible since likely to use again
  - Also keep as many books on table as possible, since faster than going to library



Garcia, Spring 2007 © UCB

# **Memory Hierarchy Basis**

- Cache contains copies of data in memory that are being used.
- Memory contains copies of data on disk that are being used.
- Caches work on the principles of temporal and spatial locality.
  - Temporal Locality: if we use it now, chances are we'll want to use it again soon.
  - Spatial Locality: if we use a piece of memory, chances are we'll use the neighboring pieces soon.

CS61C L31 Caches I (12)

Garcia, Spring 2007 © UG

#### Cache Design

- · How do we organize cache?
- Where does each memory address map to?

(Remember that cache is subset of memory, so multiple memory addresses map to the same cache location.)

- · How do we know which elements are in cache?
- How do we quickly locate them?



### **Direct-Mapped Cache (1/4)**

- In a <u>direct-mapped cache</u>, each memory address is associated with one possible block within the cache
  - · Therefore, we only need to look in a single location in the cache for the data if it exists in the cache
  - · Block is the unit of transfer between cache and memory









#### **Issues with Direct-Mapped** Since multiple memory addresses map to same cache index, how do we tell which one is in there? What if we have a block size > 1 byte? Answer: divide memory address into three fields tttttttttttttt iiiiiiiii 0000 index byte to check offset to if have select within correct block

block

block

# **Direct-Mapped Cache Terminology**

- · All fields are read as unsigned integers.
- Index: specifies the cache index (which "row"/block of the cache we should look in)
- Offset: once we've found correct block, specifies which byte within the block
  we want
- Tag: the remaining bits after offset and index are determined; these are used to distinguish between all the memory addresses that map to the same location

Col

CS61C L31 Caches I (20)

arcia. Spring 2007 © UC



# **Direct-Mapped Cache Example (1/3)**

- Suppose we have a 16KB of data in a direct-mapped cache with 4 word blocks
- Determine the size of the tag, index and offset fields if we're using a 32-bit architecture
- Offset
  - need to specify correct byte within a block
  - · block contains 4 words
    - = 16 bytes
    - = 24 bytes

• need 4 bits to specify correct byte

CS61C L31 Caches I (2

Garcia, S

# **Direct-Mapped Cache Example (2/3)**

- •Index: (~index into an "array of blocks")
  - · need to specify correct block in cache
  - · cache contains 16 KB = 214 bytes
  - · block contains 24 bytes (4 words)
  - ·# blocks/cache
    - bytes/cache bytes/block
    - = <u>2<sup>14</sup> bytes/cache</u> 2<sup>4</sup> bytes/block
    - = 2<sup>10</sup> blocks/cache
  - need 10 bits to specify this many blocks

Col CSSIC 131 Cacher 1/23

Sarcia Spring 2007 © II

## **Direct-Mapped Cache Example (3/3)**

- Tag: use remaining bits as tag
  - tag length = addr length offset index = 32 - 4 - 10 bits = 18 bits
  - so tag is leftmost 18 bits of memory address
- Why not full 32 bit address as tag?
  - · All bytes within block need same address (4b)
  - Index must be same for every address within a block, so it's redundant in tag check, thus can leave off to save memory (here 10 bits)



Garcia, Spring 2007 © UCI

## And in Conclusion...

- We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible.
- So we create a memory hierarchy:
  - each successively lower level contains "most used" data from next higher level
  - · exploits temporal & spatial locality
  - do the common case fast, worry less about the exceptions (design principle of MIPS)
- · Locality of reference is a Big Idea



Garcia, Spring 2007 © UC