************************************ cs162 Lecture Notes for Feb 28, 2005 by Jesse Davidson, David Lee **************************** ** QUICK REFERENCE ** Translation Lookaside Buffer Principle of Locality Set Associativity Demand Paging, Thrashing, Working Sets ********************* ---------------------------- TRANSLATION LOOKASIDE BUFFER ---------------------------- PROBLEM with segmentation and paging: extra memory refer- ences to access translation tables can slow programs down by a factor of two or three. There are obviously too many translations required to keep them all in special processor registers. But for small machines (e.g. PDP-11), can have one regis- ter for every page in memory, since can only address 64Kbytes. SOLUTION: Translation Lookaside Buffer (TLB), also called: Translation Buffer (TB) (DEC), or Directory Lookaside Table (DLAT) (IBM), or Address Translation Cache (ATC) (Motorola). [ As Prof Smith pointed out, ATC is really the correct name. ] ----------- | | VA ---> | TLB | -----> RA | | ----------- | ^ | | | | | | V | ------------ |TRANSLATOR| ------------ TLB ------- |VA|RA| ------- | | | ------- | | | ------- | | | ------- A TLB is used to store a few of the translation table en- tries. It's very fast, but only remembers a small number of entries. On each memory reference: First ask TLB if it knows about the page. If so, the reference proceeds fast. If TLB has no info for page, translator must go through page and segment tables to get info. Refer- ence takes a long time, but give the info for this page to TLB so it will know it for next reference (TLB must forget one of its current entries in order to record new one). So what the TLB does is: Accept virtual address See if virtual address matches entry in TLB If so, return real address If not, ask translator to provide real address. Translator loads new translation into TLB, replacing old one. (Usually one not used recently.) (Must replace entry in same set.) ******** NOTE that the TLB is basically a cache. It simply holds associates a VA to a RA, and if it is missing an entry, passes the request along to whatever paging/segmentation scheme we happen to be using. ******** Will the TLB work well if it holds only a few entries, and the program is very big? YES - due to Principle of Locality. (Peter Denning) --------------------- PRINCIPLE OF LOCALITY --------------------- TEMPORAL locality (Recently Used Information) Information that has been used recently is likely to be continued to be used. [Alternate formulation] Information in use now consists mostly of the same information as was used recently. SPATIAL locality (Information in same vicinity) Information near the current locus of reference is also likely to be used in the near future. Example - top of desk is cache for file cabinet. If desk is messy, stuff on top is likely to be what you need. Explanation - code is either sequential or loops. Data used together is often clustered together (array elements, stack, etc.) In practice, TLBs work quite well. Typically find 96% to 99.9% of the translations in the TLB. ----------------- SET ASSOCIATIVITY ----------------- TLB is just a memory with some comparators. Typical size of memory: 16-512 entries. Each entry holds a virtual page number and the corresponding physical page number. How can memory be organized to find an entry quickly? One possibility: search whole table associatively on every reference. Hard to do for more than 32 or 64 en- tries. A better possibility: restrict the info for any given virtual page to fall in into a subset of entries in the TLB. Then only need to search that Set. Called set as- sociative. E.g. use the low-order bits of the virtual page number as the index to select the set. Real TLBs are either fully associative or set associative. If the size of the set is one, called direct mapped. <--- 4 sets ---> -------------------------------- | | | | | -------------------------------- 4 elements / set | | | | | -------------------------------- | | | | | -------------------------------- | | | | | -------------------------------- virtual address ----------------------- | page# | set#| byte# | ----------------------- NOTE that the set # is actually the lower order bits of the page #. Use lower order bit to search since it is easier to find. Higher probability of finding Virtual Address. Replacement must be in same set. Translator is a piece of hardware that knows how to translate virtual to real addresses. It uses the PTBR to find the page table(s). Reads the page table to find the page. TLBs are a lot like hash tables except simpler (must be to be implemented in hardware). Some hash functions are better than others. Is it better to use low page number bits than high ones to select the set? Low ones are best: if a large contiguous chunk of memory is being used, all pages will fall in dif- ferent sets. Must be careful to flush TLB during each context swap. Why? Otherwise, when we switch processes, we'll still be using the old translations from virtual to real, and will be addressing the wrong part of memory. [Alternative] - can make process identifier (PID) part of virtual address. Have a Process Identifier Register (PIDR) which supplies that part of the address. When we modify the page table, we must either flush TLB or flush the entry that was modified. --------------------------------------------- Topic: DEMAND PAGING, THRASHING, WORKING SETS --------------------------------------------- Paging gives you the ability to run multiple processes in memory. So far we have disentangled the programmer's view of memory from the system's view using a mapping mechanism. Each sees a different organization. This makes it easier for the OS to shuffle users around and simplifies memory sharing between users. However, until now a user process had to be completely loaded into memory before it could run. (sort of- we mentioned page faults and segment faults, but...) This is wasteful since a process may only need a small amount of its total memory at any one time (locality). Virtual memory permits a process to run with only some of its virtual address space loaded into physical memory. Virtual address space, translated to either a) physical memory (small, fast) or b) disk (backing store), which is large but slow. [ Backing storage is typically disk. ] ************************************************************* The idea is to produce the illusion that the entire virtual address space is in main memory, when in fact, it isn't. More generally, we have a multi-level (2 level in this case) memory hierarchy. We want to have the cost of the slower and larger level, and the performance of the smaller and faster level. ************************************************************* --------- | CPU | --------- | CACHE | --------- ^ | V --------- |MEMORY | --------- ^ | V --------- | DISK | --------- ^ | V --------- | other | [ TAPE -- Significantly slower, as those unable to --------- access inst accounts March 5-7 found out ] The reason that this works is that most programs spend most of their time in only a small piece of the code. Principle of Locality Temporal Locality - the same information is likely to be reused. Spatial Locality - nearby information is also likely to be used in the near future. If not all of process is loaded when it is running, what hap- pens when it references a byte that is only in the backing store? Hardware and software cooperate to make things work anyway. First, extend the page tables with an extra bit ``present, or valid''. If present isn't set then a reference to the page results in a trap. This trap is given a special name, page fault. Page fault - an attempt to reference a page which is not in memory. Page Table Entry ---------------------------------------------------------------- | RA | Protection bits | valid bit | dirty bit | reference bit | ---------------------------------------------------------------- Any page not in main memory right now has the ``present/valid'' bit cleared in its page table entry. _______________________ When page fault occurs: Trap to OS - Don't trust user/ abnormal ends (abend) Verify that reference is to valid page; if not, abend. Find page frame to put page. Find a page to replace, if no empty frame. If dirty, find a place to put replaced page on secon- dary storage (Can reuse previous location.) Remove page (either copy back or overwrite) Update page table. Update map of secondary storage if necessary (to show where we put page) Update memory (core) map Flush TLB entry for page that has been removed. Operating system brings page into memory Find page on secondary storage. Transfer it. Update page table (set valid bit, and real address) Update map of file system/disk to show that page is now in memory. (e.g. update cache of inodes) Update Core Map (memory map). The process resumes execution. (i.e. it goes on ready list. maybe it resumes) Note that all of these take time. We may switch to another process while the IO is taking place. Multiprogramming is supposed to overlap the fetch of a page (or I/O) for one process with the execution of another. If no process is available to run (all doing I/O or page fault), called multiprogramming idle or page fetch idle. Page out - to remove a page. Page out a process - remove it from memory. Page in a process - load its pages into memory. __________________ RESUMING A PROCESS Continuing (resuming) the process is very tricky, since page fault may have occurred in the middle of an instruction. Don't want user process to be aware that the page fault even happened. Can the instruction just be skipped? Suppose the instruction is restarted from the beginning? How is the ``beginning'' located? Even if the beginning is found, what about instruc- tions with side effects Without additional information from the hardware, it may be impossible to restart a process after a page fault. Machines that permit restarting must have hardware sup- port to keep track of all the side effects so that they can be undone before restarting. Early Apollo approach for 68000 (two processors, one just for handling page faults) IBM 370 solution (execute long instructions twice) [ 'practice' execution to detect page fault ] If you think about this when designing the instruction set, it isn't too hard to make a machine support virtual memory. It's much harder to do after the fact. [ Note that RISC instruction sets solve this nicely ] How many page faults can occur in one instruction?? E.g. instruction spans page boundaries, and each of two operands spans two pages. Could have 2 level page table, with one page of page table needed to point to each instruction & data page. [ = SIX ] Once the hardware has provided basic capabilities for virtual memory, the OS must implement 3 algorithms: Page fetch algorithm: when to bring pages into memory. Page replacement algorithm: which page(s) should be thrown out, and when. Page placement algorithm: where to put the page in memory. Note that the page placement algorithm for main memory is ir- relevant - memory is uniform. (But CRAY has non-uniform memory access time. Also not irrelevant for other parts of memory hierarchy.) ______________________ Page Fetch Algorithms: Demand paging: start up process with no pages loaded, load a page when a page fault for it occurs, i.e. wait until it absolutely MUST be in memory. Almost all paging systems are like this. Request paging: let user say which pages are needed. What's wrong with this? Users don't always know best, and aren't always im- partial. They will overestimate needs. Still need demand paging, in case user doesn't remember to bring in the right page. Prefetching, or Prepaging: bring a page into memory be- fore it is referenced (e.g. when one page is referenced, bring in the next one, just in case). Reason for prepaging is (a) bring in several pages at once - cut per page overhead (b) eliminate real time delay in waiting for page - overlap computation and fetch. Idea is to guess at which page will be needed. Hard to do effectively without a prophet, may spend a lot of time doing wasted work. If used at all, typically One block lookahead - i.e. the next one. Seldom works. Can also do "swapping", ("working set restoration") whereby when you start a process, you swap in most or all of its pages, or at least all of the pages it was using the last time it was running. When it stops, you swap out its pages in a bunch on contiguous tracks on disk. [ Also called working set restoration. ] Overlays - a technique by which the user divides his pro- gram into segments. The user issues commands to load and unload the segments from memory; these commands specify the location in memory where the segments are placed. Used when there is no virtual memory, and the user is given a partition of real memory to work with. [ PAINFUL ] ____________________________ Page Replacement Algorithms: Random (RAND): pick any page at random. FIFO: throw out the page that has been in memory the longest. The ideas are: (a) its simple, and (b) the first page that was fetched is believed to be no longer needed. LRU (least recently used): use the past to predict the future. Throw out the page that hasn't been used in the longest time. If there is locality, then this is presum- ably the best you can do. ********************************************************** NOTE that ( almost ) no one actually implements true LRU. Too much overhead. Instead, a LRU bit is added to each entry, cleared every ***, and when an entry is used, the bit is set. Searching for an entry is done modulus style, turning bits off as search proceeds, so that in worst case we run through all entries before finding a cleared bit. ********************************************************** MIN (or OPT): as always, the best algorithm arises if we can predict the future. Throw out the page that won't be used for the longest time into the future. This requires a prophet, so it isn't practical, but it is good for comparison. _____________________ Real and Virtual Time Virtual Time is time as measured by a running process - doesn't include time that process is blocked (e.g. for page fault or other reason). Often in units of memory references. Real Time - time as measured by wall clock. Includes time that process is blocked (including page faults). ---------------------------------------------------------------------- EOF