Lecture Notes prepared by: Jianrui Zhang and Wai Leung William Wong


Lecture on Monday, 02/28/05


Announcement:
-------------
There is a midterm today from 7:00 to 8:30 pm at 10 Evans. The exam
will be close book.


Main topics dicussed in lecture today:
-------------------------------------

 1. Inverted Page Table (continued from last lecture)
 2. Problem With Segmentation and Paging
 3. Translation Lookaside Buffer (TLB)
 4. Principle of Locality
 5. Virtual Memory
 6. Page Fault
 7. Page Fetch Algorithm
 8. Page Replacement Algorithm
 9. Real and Virtual Time
10. Evaluation of Paging Algorithms (just got started)


Notes:
------

1. Inverted Page Table
   -------------------

   Idea is that page table is organized as hash table. Hash from  
   virtual  address  into table with number of entries larger than 
   physical memory size.  (Page table shared  by all processes.)

   (Notes: this was discussed in greater detail in the last lecture)


2. Problem with segmentation and paging
   ------------------------------------

   extra  memory  references  to access translation tables can slow 
   programs down by a factor of two or three. There are obviously 
   too many translations required to keep them all in special processor
   registers.

    a) But for small machines (e.g. PDP-11), can have one register 
       for every page in memory, since can only address 64Kbytes.


    b) Solution: to use Translation Lookaside Buffer (TLB). (see section
       #3 below for explanation on TLB)


3. Translation Lookaside Buffer (TLB)
   ----------------------------------
   
    a) Also called:
        1. Translation Buffer (TB) (DEC), or
        2. Directory Lookaside Table (DLAT) (IBM), or
        3. Address Translation Cache (ATC) (Motorola).


            Basic Ideas of TLB
            -------------------


    b) A TLB is used to store a few of the translation table entries. 
       It's very fast, but it only remembers a small number of entries.

       On each memory reference:


                           +-------+
      Virtual Adrress ---> |  TLB  | ---> Real Address
           (VA)            +-------+        (RA)

                            ^   |
                            |   |
                            |   v

                         +------------+
                         | Translator |
                         +------------+

       * The translator understands the page table


        i) First ask TLB if it knows about the page. If so, the reference 
           proceeds fast.

       ii) If TLB has no information for a page, the translator must go
           through the entire page and segment tables to find the 
           information. Reference takes a long time, but it gives the
           information for this page to TLB so it will know it for next 
           reference to allow faster search next time. 

         * Note: TLB must forget one of its current entries in order to 
                 record new one if it is already full. And to do this, a
                 replacement algorithm is needed to determine which entry
                 to swap out


    c) TLB Organization: 
       
       TLB works like a black box, virtual page number goes in, and physical
       page location comes out. This is similar to a cache.


        Structure of TLB

            	     VA     RA
	         +--------------+
        	 |      |       |
	         |--------------|
        	 |      |       |
	         |--------------|
       	VA --->	 |      |       | ---> RA
	         |--------------|
	         |      |       |
        	 |--------------|
	         |      |       |
        	 |--------------|


    d) Summary of what the TLB does is (and how it works):

        i) Accept virtual address
       ii) See if virtual address matches entry in TLB
      iii) If so, return real address
       iv) If not, ask translator to provide real address.
        v) Translator loads new translation into TLB, replacing old one.
            -  The old one is Usually not used recently. If we keep swapping
               out new ones, there will be many overhead doing so.
            -  Must replace entry in same set.

    e) Will the TLB work well if it holds only a few entries, and the program
       is very big?

       Answer: Yes - due to Principle of Locality. (See section #4 below for 
               explanation on "principle of Locality)


    f) How well (efficient) does TLB work?
        - In practice, TLBs work quite well. Typically 96% to 99.9% of the 
          translations can be found in the TLB.


            Greater Details of TLB
            ----------------------


    g) TLB is just a memory with some comparators. 
         i) Typical size of memory: 16-512  entries.   
        ii) Each  entry holds a virtual page number and the corresponding 
            physical page number. 

    h) How can memory be organized to find an entry quickly?

         i) One possibility: to search whole table associatively on every 
            reference.  However, it is hard to do for more than 32 or 64 
            entries.
        ii) A better possibility:  restrict the information for any given
            virtual page to fall in into a subset of entries in the
            TLB. Then only need to search that Set.  This idea is Called set  
            associative.  (E.g. use the low-order bits of the virtual page 
            number as the index to select the set). Real TLBs are either 
            fully associative or set associative. If the size of the set 
            is one, called direct mapped.

             - Diagram of set associative TLB:


            |<----                 4 sets                      --->|

         -  +------------------------------------------------------+
         ^  |            |            |            |               |
         |  |------------------------------------------------------|
         |  |            |            |            |               |
            |------------------------------------------------------|
       8    |            |            |            |               |
   elements |------------------------------------------------------|
            |            |            |            |               |
            |------------------------------------------------------|
            |            |            |            |               |
            |------------------------------------------------------|
            |            |            |            |               |
            |------------------------------------------------------|
         |  |            |            |            |               |
         |  |------------------------------------------------------|
         v  |            |            |            |               |
         -  |------------------------------------------------------|

             - IMPORTANT: Replacement must be in same set.


    i) Translator is a piece of hardware that knows how to translate virtual
       to real addresses.  It uses the Page Table Base Register (PTBR) to 
       find the page table(s). Then it Reads the page table to find the page.

    j) TLBs are a lot like hash tables except it is simpler.

       * Note: TLB, unlike hash tables, must be to be implemented in hardware.

         - Some  hash functions are better than others.

           Question: Is it better to use low page number bits than high ones
                     to select the set?

           Answer: Yes, low ones are best because if a large contiguous chunk
                   of memory is being used, all pages will fall in different
                   sets.


     k) We must be careful to flush TB during each context swap.
        Question: Why?

        Answer:
            1. Otherwise, when we switch processes, we'll still be using the
               old translations from virtual to real, and will be addressing
               the wrong part of memory.

            2. Alternatively, we can make process identifier (PID) part of
               virtual address. Have a Process Identifier Register (PIDR) 
               which supplies that part of the address.

    l) When we modify the page table, we must either flush TLB or flush the 
       entry that was modified to keep the TLB entries accurate.


4. Principle of Locality
   ---------------------

    a) Invented by Peter Denning

    b) There are two parts of this principle:
        i) Temporal Locality - Information that has been used recently is 
           likely to be continued to be used.

             - An Alternate formulation - information in use now consists 
               mostly of the same information as was used recently.

       ii) Spatial Locality - information near the current locus of
           reference is also likely to be used in the near future.

             - Example - top of desk is cache for file cabinet.  If desk 
               is messy, stuff on top is likely to be what you need.

    c) Why does principle of locality work? 
       Answer: Because code is either sequential or loops. Data used  
               together is often clustered together (i.e. array elements,
               stack, etc.)


5. Virtual Memory
   --------------

    a) Why use virtual memory (VM)?
       
       Answer:

       All the discussion in lecture until now, a user process had to be 
       completely loaded into memory before it could run. This is wasteful
       since  a process may only need a small amount of its total memory at
       any one time (due to the principle of locality). Virtual memory 
       permits a process to run with only some of its virtual address 
       space loaded into physical memory.

    b) Virtual address space: the set of all virtual address could be 
       generated.

       It is translated to either: 
          i) physical memory, usually small, but fast, or 
         ii) backing store (typically disk), usually large but slow.

    c) The idea is to produce the illusion that the entire virtual address
       space is in main memory, when in fact, it isn't (it is at a disk, maybe).

    d) More generally, we have a multi-level memory hierarchy.  We want 
       to have the cost of the slower and larger level, and the performance
       of the smaller and faster.

        - Diagram of a memory hierarchy, faster on top.

           +--------------+
           |    CPU       |
           |   Cache      |
           +--------------+
                 ^
                 |
                 v
           +--------------+
           |   Main       |
           |  memory      |
           +--------------+
                 ^
                 |
                 v
           +--------------+
           |   disk       |
           +--------------+
                 ^
                 |
                 v
           +--------------+
           |   other      |
           |  storage     |
           |  media       |
           +--------------+


    e) The reason why this works is that most programs spend most of 
       their time in only a small piece of the code.


    f) If not all of process is loaded when it is running, what happens when
       it  references a byte that is only in the backing storage?  

        - Hardware and software cooperate to make  things  work

          i) First, extend the page tables with an  extra bit 'present',
             or 'valid'. If present isn't set then a reference to the 
             page results in a trap. This trap is given a special name, 
             page fault. (See how page fault is handle in section #6 below)

         ii) Diagram of Page Table Entry.  (show real address, protec-

             tion bits, valid/present bit, dirty bit, reference bit).

        +------------------------------+
        | real address | P | V | D | R |
        +------------------------------+

        P = Protection bit -  controls read/write
        V = Valid bit - set if in main memory
        D = Dirty bit - if modified
        R = Reference bit - set if referenced to

6. Page Faults
   -----------

    a) Definition: page fault is an attempt to reference a page which is
                   not in memory.

    b) Process of handling page fault

        - When page fault occurs:

          1) Signal a trap to OS so that OS can regain control over system
          2) Verify that reference is to a valid page; if not, abend.
          3) Find a page frame to put page.
               i) Find a page to replace, if no empty frame.
              ii) If dirty, find a place to put replaced page on secondary 
                  storage (Can reuse previous location.)
             iii) Remove page (either copy back or overwrite)
              iv) Update page table.
               v) Update map of secondary storage if necessary (to show
                  where we put page)
              vi) Update memory (core) map
             vii) Flush TLB entry for page that has been removed.
         4) Operating system brings page into memory
               i) Find page on secondary storage.
              ii) Transfer it.
             iii) Update page table (set valid bit, and real address)
              iv) Update map of file system/disk to show that page is 
                  now in memory. (e.g. update cache of inodes)
               v) Update Core Map (memory map).
         5) The process resumes execution.  (i.e. it goes on ready list.
            maybe it resumes)
         
            * Note that all of these take time. We may switch to another 
              process while the IO is taking place.

    c) Multiprogramming is supposed to overlap the fetch of a page
       (or I/O) for one process with the execution of another.
         - If no process is available to run (all doing I/O or page fault),
           called multiprogramming idle or page fetch idle.

    d) Definition of a few terms:
         1) Page out - to remove a page.
         2) Page out a process - remove it from memory.
         3) Page in a process - load its pages into memory.

    e) Continuing (resuming) the process is very tricky, since page fault
       may have occurred in the middle of an instruction. Don't want user
       process to be aware that the page fault even happened.

         - something to worry about when handling page fault

         1) Can the instruction just be skipped? 
         2) Suppose the instruction is restarted from the beginning?
               i) How is the ``beginning'' located?
              ii) Even if the beginning is found, what about instructions
                  with side effects, like MOV (SP)+, 10?
         3)  Without additional information from the hardware, it may
             be impossible to restart a process after a page fault.
             Machines that permit restarting must have  hardware  support
             to keep track of all the side effects so that they can be 
             undone before restarting.

    f) How many page faults can occur in one instruction?
       E.g. instruction spans page boundaries, and each of two operands
            spans two pages. Could have 2 level page table, with one 
            page of page table needed to point to each instruction & 
            data page.


7. Page Fetch Algorithm
   --------------------

    a) Demand paging: start up process with  no  pages loaded, load a page
       when a page fault for it occurs, i.e. wait until it absolutely 
       MUST be in memory.  Almost all paging systems are like this.

    b) Request paging: let user say which pages are needed.

         - What's wrong with this?
             1) Users don't always know best, and aren't always impartial.
                They will overestimate needs. Maybe mention overlays here,
                although overlays are  even  more draconian than request 
                paging.
             2) Still need demand paging, in case user doesn't remember
                to bring in the right page.

    c) Prefetching, or Prepaging: 
        1) definition: bring a page into memory before it is referenced 
                       (e.g. when one page is referenced, bring in the next
                       one, just in case).
        2) Reason for prepaging is
             i) bring in several pages at once - cut per page overhead
            ii) eliminate real time delay in waiting for page - overlap
                computation and fetch.
        3) Idea is to guess at which page will be needed. Hard to do 
           effectively without a prophet, may spend a lot of time doing 
           wasted work.  If used at all, typically One block lookahead
            - i.e.  the next one.
        4) This algorithm Seldomly works.
        5) Can also do "swapping", ("working set restoration") whereby when
           you start a process, you swap in most or all of its pages, or at
           least all of the pages it was using the last time it was running.
           When it stops, you swap out its pages in a bunch on contiguous
           tracks on disk.
             - This is also called working set restoration.

    d) Overlays - a technique by which the user divides his program into 
       segments. The user issues commands to load and unload the segments
       from memory; these commands specify the  location in memory where 
       the segments are placed. Used when there is no virtual memory, and
       the user is given a partition of real memory to work with.


8. Page Replacement Algorithms
   ---------------------------

    a) FIFO: throw out the page that has been in memory the longest. The
             ideas are:  
                i) its simple, and 
               ii) the first page that was fetched is believed to be no longer
                   needed.
        - comment about this algorithm: 
          This is the easiest one. It is even easier than Randon (discussed 
          next) because we don't even to generate a random number which requires
          some work.
                     
    b) Random: pick any page at random.

    c) LRU (least recently used): use the past to predict the future. Throw 
       out the page that hasn't been used in the longest time. If there is
       locality, then this is presumably the best you can do. 
    d) MIN (or OPT): as always, the best algorithm arises if we can predict
       the future.
         - We can just throw out the page that won't be used for the longest 
           time into the future. This requires a prophet, so it isn't 
           practical, but it is good for comparison.


9. Real and Virtual Time
   ---------------------

    a) Virtual Time is time as measured by a running  process  - doesn't
       include  time  that process is blocked (e.g. for page fault or 
       other reason).  Often in  units  of  memory references.

    b) Real Time - time as measured by wall clock. Includes time that 
       process is blocked (including page faults).


10. Evaluate paging algorithms
    --------------------------
    
    a) Costs of page faults
          i) CPU overhead for page fault- handler, dispatcher, I/O routines.
             (e.g. 3000 instructions).
          ii) Possible CPU (multiprogramming) idle while page arrives
         iii) I/O busy while page is transferred
          iv) Main memory (or cache) interference while page is transferred.
           v) Real time delay to handle page fault.

     * Note: we just got started on this topic when time ran out, so we will
             continue to this topic in the next lecture.