CS162 Lecture 10: Wednesday, Feb 23, 2005

Kada Situ

--------
Announcements: 
  - First midterm on Monday, Feb 28, from 7:10-8:30pm, close book, close electronic 
    devices (except for a simple calculator), see introductory handout for more 
    information. There still will be a lecture in the day of midterm though.
  - There will be a review section for the first midterm on Thursday, Feb 24, from 
    7:00-8:00pm, at 10 Evans.
  - It would be nice if you read the introductory handout first, which is available on 
    the cs162 web page, before asking any administrative questions in lecture.

--------
Major topic: Sharing Main Memory -- Segmentation and Paging 

Sub-topics:
  - Paging continued (refer to p.16 - p.21 of the lecture handout, it's called 
    "smith.4.txt" in the student directory)
    + Disadvantages (the issue of page table size is what was left off from Friday's 
      lecture)
    + Location of page tables
    + "wired down"
  - Paging and Segmentation Combined
    + Advantages
    + Disadvantages
  - Paging VS. Segmentation
  - Sharing
  - Inverted page tables
  
-------- Lecture begins

  Table space: (review from last lecture)
    Problem: consider 32-bit address with 4k pages, there is roughly a million entries 
             for each page table. If each entry is 4-byte, the page size is 4M for each 
             page table. Since page tables are big, keeping page tables for every process
             is not a good idea. There are 4 solutions:
             
      1. Partial solution: require each process use contiguous portion of its address 
         space so that we can use base and bounds in the page table. So, only a few large
         processes need large page table. But base and bounds registers have to be built 
         into the architecture.
         
      2. Common approach: two level table (see figure 6). The PTBR (page table base 
         register) points to the base of the page table. Instead of building a table 
         with 2^20 entries, we would build a "2-level tree" with 2^10 entries pointing 
         at 2^10 tables each has 2^10 entries.
        * The first level is called the first level page table, segment table, or a page
          directory. The second level is typically called the page table or the second 
          level page table.
        * A virtual address with 32 bits, the top 10 bits gives the entry in the first 
          level, the second 10 bits give the entry in the second level, and the remaining
          12 bits gives the byte or word address within a page. We can also implement 
          n-level (n>=2) if we have long virtual addresses (i.e. 64 bits).
  
+----+      
|PTBR|----+
+----+    |      
          |     1st level page table/segment table/page directory
          |      ____________
          +---->|------------|----------+     page table/2nd level page table
          +---->|------------|--------+ |     ________
          |    -|------------|        | +--->|--------|
          |    ||------------|        |      |--------|
          |    ||------------|        |      |--------|
          |    ||------------|        |      |--------|
          |    ||------------|        |      |________|
          |    ||------------|        |
          |    ||------------|        |       ________
          |    ||------------|        +----->|--------|
          |    ||------------|------+        |--------|
          |    V|____________|      |        |--------|
          |                         |        |--------|
          |                         |        |________|  
          |                         |         .
          |                         |         .
          |                         |         .
          |                         |         .
          |                         |         . 
          |                         |         ________
          |                         +------->|--------|
          |                         +------->|--------|---+
          |                         |       -|--------|   |
          |                         |       ||--------|   |     physical page
          |                         |       V|________|   |     __________
          |    +--------------------+                     +--->|----------|
          |    |                                               |----------|
      +-----+-----+------+                                     |----------|
      | 10  | 10  | 12   | virtual address                     |----------|
      +-----+-----+------+                                     |----------|
                    base                                       |__________|
                 
                 Figure 6: 2-level page table implementation
                 
                 
      3. Put the page tables in the OS's virtual memory, in that case the page table just
         get paged out and the OS has its own virtual address space which is typically 
         4GB. So, unneeded page are not allocated.
        Note: this solution doesn't get any faster than the 2nd solution, since in order
              to get to a page of the user's page table, we need to go to the OS's page 
              table - essentially have two levels of mapping.
              
      4. Make the page table a Hash Table: use hashed the virtual address as the index 
         into the page table, also need a collision detection algorithm such as linear, 
         pointer, or overflow, to deal with collisions.
        Note: In the lecture, Prof. Smith would use the term "inverted page table" 
              generically refers to any kind of hashed page table, the text book has a 
              different definition of inverted page table.
    
    Efficiency of access: even small page tables are generally too large to keep in the 
                          relocation box in fast memory. Instead, page table are kept in
                          main memory and the relocation box only keeps the page table's
                          base address. In this case, every time we want to access the 
                          data, we have to first go through a register access and three 
                          memory reads (base register -> first level page table -> second
                          level page table -> the physical page), whereas we just need 
                          one memory read in real memory.
    
  Where are the page tables?
    Page table are either referred to with real address, or OS's virtual addresses. They
    can not be put where users can get to them, otherwise, users could change their 
    values, which could bypass the protection.
    
    The entries of the page tables are real address (i.e. the address they point to in 
    real memory). The page table base register (PTBR) would contain a OS's virtual 
    address if the page table in OS's virtual address space, which means that another 
    level of translation is needed.
    
    Is the OS paged?
      - Yes, otherwise we would have to allocate a lot of memory for the OS. 
      
    Can page tables be paged out?
      - Yes. 
      If the page tables are in the OS's virtual address space, the translation of user 
      virtual address also requires the OS address translation. In that case, page faults
      will be taken in the OS during a paged-out page look up. (Notice here that at lease
      parts of the OS's page tables have to be in main memory)
        * Recursive page faults could occur in this case.
        * Means the OS page tables must be in real memory and use real address
      Alternatively, we can put page table in "real memory" and real addresses (i.e to
      have virtual address be the real address).
      Otherwise, have to look up a table that tells where the missing page is on disk in
      order to reach the paged-out page.
            
  What can't be paged out (called "wired down")?
    * The code that brings in pages.
    * Pages for critical parts of the OS, which includes:
      - page fault handlers (handling a page fault takes times)
      - interrupt and trap handlers that have to be fast
      - OS page table (i.e. parts of the OS's page table that point to the code to handle
        page faults)
    * Sensitive real time routines, which have to do with something happening at real 
      time. For example, page faults take the order of milli-seconds and milli-seconds
      are visible in real time system.
    * Pages that currently undergoing I/O such as I/O buffers. The I/O transfers are 
      directly to some buffer memory in real addresses, and I/O devices don't know about 
      page tables. Random results will occur if the page that is mapped to the buffer 
      memory while the transfer is going on.
    
  Paging is great for protection - you can only reference parts of the memory which 
    appear in your page table. A virtual address is generated, the hardware maps through
    the page table to a real address, and you can't get to any real address that you page
    tables don't point to, for example, the OS's code, the I/O devices, or other 
    processes' page tables, since those address are not mapped by your page tables.
  
**************************************
***Paging and Segmentation combined***
**************************************

  Review: Segmentation is a great way to break a process' address space by functional
          portion such as code, data, stack, and subroutines. Since segmentation has 
          variable size of objects and storage allocations, can't make a segment bigger
          than real main memory. One way to improve segmentation is to combine paging
          and segmentation (i.e. segments that are paged).
           
  * Diagram of segment table/page table mapping. In segment table entry, can put
    additional functionalities such as protection bits (read, write, execute), valid 
    bit, don't need base, but need bound to tell how high the segment is.
  * Each segment broken into one or more pages.
  * Segments correspond to logical units: code, data, stack. Segments vary in size and
    are often large. Protection can be associated with segments.
  * Pages are for the use of the OS; they are fixed-size to make it easy to manage 
    memory.
  * Going from paging to Paging and segmentation combined is like going from single
    segment to multiple segments, except at a higher level. Instead of having a single
    page table, we would have many page tables with a bound (don't need base) for each.
    Call the stuff associated with each page table a segment.
  
+----+      
|STBR|----+                      protection bits and some other bits
+----+    |                      |
          |      segment table   V
          |      ___________________
          +---->|1|(code) |bound|p|V|----------+     page table
                |-------------------|          |     ________ (code)
                |2|(data) |bound|p|V|--------+ +--->|--------|
                |-------------------|        |      |--------|
          +---->|3|(stack)|bound|p|V|------+ |      |--------|
          |     |___________________|      | |      |--------|
          |                                | |      |________|
          |                                | |
          |                                | |       ________  (data)
          |                                | +----->|--------|
          |                                |        |--------|
          |                                |        |--------|
          |                                |        |--------|
          |                                |        |________|  
          |                                |
          |                                |         ________  (stack)
          |                                +------->|--------|
          |                                +------->|--------|---+
          |                                |       -|--------|   |
          |                                |       ||--------|   |     physical page
          |                                |       V|________|   |     __________
          |  +-----------------------------+                     +--->|----------|
          |  |                                                        |----------|
         +-+----+------------+                                        |----------|
         | |    |            | virtual address                        |----------|
         +-+----+------------+                                        |----------|
                                                                      |__________|
                 
                 Figure 7: paging and segmentation combined
                 
  Advantages:
    * Provides two level of mapping as did 2-level paging, making paging table size
      manageable.
    * Have the advantages of paging - having everything fixed size which provides
      physical unit of management.  
    * Have the advantages of segmentation - allow us to break up address spaces into 
      logical regions and assign protection and sharing attributes to each segment. 
    * Effectively gives two dimensional addressing: segment number, word/byte number
      (i.e. address within segment)
    * Expanding or contracting segments won't interfering with other segments since they
      are paged. So, to make a segment bigger, we can expend a page instead (which can be
      anywhere in memory).
    * Don't have any compaction or external fragmentation problems because everything is
      fixed size, but do potentially have internal fragmentation because we are using 
      portion of the last page to be segment.
    * Bounds checks on segments handled by having page not be valid (quantized to page 
      size). But we don't have to do bounding checks in hardware anymore, since page 
      faults are handled in software which makes life fairly easy.
    * Save space - don't need to build page table for segments that don't exist, only 
      need to put in second level page tables only when the first level has valid entries
      point to them. 
    * Can share between processes a segment or just individual page.
    * Protection can be put in either second level pages or in the segment table.
      Typically, if the system is segmented in page, protection would normally be in segment
      table entries; if the system is just paged, protection would normally be in page
      tables entries. But some system implements both, which provides two levels of
      protection, but it depends on the hardware support.
  
  Disadvantages
    * Still have two levels mapping which haven't resolved, and it is more complicated
      than either segmentation or paging.
    * Having 2 kinds of software and hardware overhead, one for segmentation and one for
      paging.
    * Still have internal fragmentation problem - the last page of the segment is only 
      partially used. But if page size is small compared to most segments, then internal 
      fragmentation is not too bad. 

*****************************    
***Paging VS. Segmentation***
*****************************

  Segmentation is supposed to an aid to the user to organize his/her computation and
    paging is an aid to the system to allow the users have illusion that they have a
    large virtual memory.
  * page is fixed size
    - physical unit of information
    - used only for memory management
    - should not be visible to the programmer (but a programmer can actually write a 
      routine to detect the page boundaries by timing difference between a reference 
      time and a page fault time).
  * Segment is usually logical unit
    - should be visible to the use
    - should be arbitrary size
  Note: the user may see or be aware of segmentation, but the user should not be aware
        of paging. 

*************    
***Sharing***
*************

  Processes can share memory at two levels: 
    1. Have a single page shared between processes (Figure 8):
       Data sharing is accomplished by the following: have two users (assuming they use
       segmentation paging), User1 have its segment table points to a page table which 
       points to the shared page. User2 have its segment table points to a page table 
       which points to the shared page. 
       
    User 1                                                                    User 2
    +---------------+                                                         +---------------+
    |Segment table 1|-+                                                     +-|Segment table 2|
    +---------------+ |                                                     | +---------------+
                      |                                                     |
                      |  +------------+                     +------------+  |
                      +->|page table 1|-+                 +-|page table 2|<-+
                         +------------+ |                 | +------------+
                                        |  +-----------+  |
                                        +->|shared page|<-+
                                           +-----------+
     
    Figure 8. A page shared between two processes
     
    2. Have a single segment shared a single segment which all the pages in it (Figure 9).
       Data sharing is accomplished by the following: two different users, each of whom
       has a segment table, and each segment table has an entry that points to a shared
       page table, and that table point to number of shared page. In this case, the uses
       are sharing the entire segment and all the pages in it. This is a rather a normal
       way because segments are logical pieces of information, and sharing a physical 
       page but a logical segment is more reasonable.
    
    User 1                            User 2
    +---------------+                 +---------------+
    |segment table 1|-+    shared   +-|segment table 2|
    +---------------+ |    page     | +---------------+
                      |    table    |
                      |  +-------+  |
                      +->|       |<-+
                         |       |  
                         |=======|--+
                         |       |  |  shared paged segment
                         |       |  |  +--------------+
                         +-------+  +->|              |
                                       |              |
                                       |              |
                                       |              |
                                       +--------------+
                                        
    Figure 9. A shared paged segment is shared between two processes
    
  * Does shared region have to be at same address in each process?
    - The share region doesn't necessarily have to be the same address in all the process
      as long as the region can be found.
  * Can shared region contain any absolute addresses (i.e. virtual addresses)?
    - Suppose the shared region has a data structure with pointers in it (i.e. linked 
      list). Since pointers are addresses, if the pointers contain virtual addresses
      which maps through each process' page table, we will have two different processes
      will have page table that maps given virtual address to given place. A shared 
      region should not contain any absolute addresses, because two processes sharing 
      will potentially interpret those absolute addresses each according to its own table
      and can map through different places which gives odd results. 
    - But a shared region can contain relative addresses. For example, these addresses 
      can be offset to a register, and each process can have the register loaded with 
      something different.
    - For a data structure, you can make sure all the pointers in it are relative to the
      base register that points to the beginning of the data structure.
    - If the addresses are relative to the beginning of the segment, the entire segment
      is then shared.
  
  Problems: 
  * Copy on write
      Problem statement: Suppose there are many processes sharing the same very huge data
        (i.e. 1GB - 1TB). If one process if modifying the shared data, it should just 
        modifies it own copy of data and no other processes should see the modification. 
        However, to duplicate the huge data for every process just because one process 
        modifies only few words is certainly not feasible. How can we implement 
        efficiently this situation?
        
    The huge shared data is share-protected, as soon as a process tries to write to the 
    shared data (to a particular page), it is trapped. Then the trap routine notice there
    is a write to a shared data, a private of the page is created for the writing process.
    
        PT 1                         PT 2
        +---+    +--------------+    +---+
        |---|--->|--------------|<---|---|
        |---|--->|--------------|<---|---|
        |---|--->|--------------|<---|---|
try to  |---|--->|--------------|<---|---|
write to|---|--->|--------------|<---|---|
shared  |---|--->|--------------|<---|---|
data    |---|--->|--------------|<---|---|
    +---|---|    |--------------|<---|---|
    |   +---+    |--------------|    +---+
    |            |--------------|
    V            |--------------|
    +----+       |--------------|
    |new |       |--------------|
    |page|       +--------------+  
    +----+
    (a new private copy a page 
    of the shared data is 
    created for this process)
    
    Figure 9. copy on write
    
  * Another problem
    Problem statement: the OS and the users have their own address space, how do we move
      information between the OS and the user memory? Notice registers are too small to 
      contain every pieces of information. As a example, consider a process makes a call 
      to I/O, it is then trapped to the OS. The OS then gets the user virtual address 
      the points to the portion of memory where the I/O supposes to write to.
       
    Solution #1: Translate to real addresses and use it - the OS just runs unmapped. The
                 user passes the virtual address to the OS, then the OS looks into the 
                 user's page table and translates to a real address (it is done in 
                 software).
      Note: addresses that are contiguous in the virtual address space may not be 
            contiguous in physical memory. If a process is requesting an I/O transfer in
            a contiguous piece of virtual memory, Since the I/O system doesn't usually 
            know about paging, it will transfer data into consecutively location of the 
            real memory. In that case, I/O operations may have to split up into separate 
            pages or requiring separate transfers.
      
  
  <Transfer I/O to the contiguous VA>    
  
  VA space:       PA space:
      +---+         +----+
      |   |---+ +-->|====|<==+
      |---|   +-|-->|====|<==+
      |   |-----+   |====|   |
      |---|         |====|   +=== I/O
      |   |---+     |====|   |
      |---|   |     |====|   |
      |   |---|---->|====|<==+
      +---+   |     |====|   |
              |     |====|   |
              |     |====|   |
              +---->|====|<==+
                    +----+
                    
     Figure 10. problem for solution #1
     
    Solution #2: Use the user page table to do the translation dynamically, but it will 
                 requires the support from the hardware. Also note we will need to have 
                 two address spaces active at the same time, which means have to have 
                 both the OS's virtual addresses translate to the OS's page table and 
                 the user virtual addresses translate to the user's page table taken 
                 place at the same time, therefore requiring both the user PTBR and the
                 system PTBR be active at the same time.
    
    Solution #3: Have the OS's page table entries point to the user pages. The user has a
                 page table points to the real memory that it want to do the I/O, since 
                 the OS can read the user page table, it know where to find the location
                 in real memory. The OS can then address the user virtual address space 
                 by having a pointer in its own page table.
    
   OS PT:                 User PT:
    +---+                     +---+
    |---|                    |---|
    |---|                    |---|
    |---|                    |---|
    |---|       Real         |---|
    |---|       Memory       |---|
    |---|--+    +----+    +--|---|
    +---+  |    |    |    |  +---+
           |    |    |    |
           |    |    |    |
           +--->|====|<---+
                |    |
                |    |
                +----+
                
    Figure 11. Illustration for solution #3
    
    Solution #4: The 32-bit virtual address space is divided into two parts: the OS is in
                 the upper part, the user lives in the button part. The two parts have 
                 different page tables, thus requires two PTBR for different mappings.
                 The OS can generate an address anywhere in the upper virtual address 
                 space. The user gives the OS a virtual address, and the OS can address 
                 it. On the other hand, the user is not allowed to address the OS's
                 area. The high order bit in the virtual address tells whether it is 
                 user's address or the OS's address. So, the OS can generate address
                 from 0 up to 2^(32-1) so that it can read and write the active user's 
                 address space.
    
    
    32-bit VA space [0 - 2^(32-1)]
      |    -+-----------+___
      |   / |           | ^
      |   | | OS        | |    the OS can address 2^(31) up to 2^(32-1)
      |   | |           | |
      |   | |           |PT 1 
      |   | |-----------| |
      |   | |           | |
      |   | |OS area    | |
      |   | |           | V
      |___/ |___________|___
          \ | | stack   | ^   
          | | V         | |   the user can address 0 up to 2^(31-1)
          | |           | |
          | |   data    | |
          | |-----------|PT 2
          | |           | |
          | |           | |
          | |           | |
          \ |   code    | V
           -+-----------+---
            
            
    Figure 12. Illustration for solution #4

   The VAX system uses solution #4:
   In VAX: (This part comes straight out from the lecture note) 
     * Address is 32 bits, top two select segment. Four base-bound pairs define page 
       tables (system, P0, P1, unused).
     * Pages are 512 bytes log.
     * Read-write protection information is contained in the page table entries, not in
       the segment table.
     * One segment contains operating system stuff, two contain stuff of current user 
       process.
     * Potential Problem: page tables can get big. Don't want to have to allocate them
       contiguously, especially for large user processes. Solution is to map the user page 
       tables os the user page tables can be scattered:
      - System base-bounds pairs are physical addresses, system tables must be
        contiguous.
      - User base-bounds pairs are virtual addresses in the system space.  This allows
        the user page tables to be scattered in non-contiguous pages of physical memory.
      - The result is a two-level scheme.
      - This is alternative to normal two level scheme. If normal two level scheme were 
        used, and if page tables were paged, would actually be four level scheme.

**************************      
***Inverted Page Table:***
**************************
    An inverted page table has one entry for each real page of memory, and it is
    organized as hash table which hashes from virtual address into table with number of 
    entries larger than physical memory size. The inverted page table have to be only as 
    big as to have every valid page. Inverted page table is shared by all processes in 
    the system. It is much more efficient because the system have only one page table 
    instead of having each page table for every process, but the process ID have to be in
    the hash table in order to know the correct process.
    
----- End of lecture