Lecture 2/12/07
The lecture includes the following topics:

I.  Monitor examples
II. Introduction to Storage Allocation; Linkers
III.Dynamic Storage Allocation  
IV. Sharing Main Memory -- Segmentation and Paging 

	Topic: Monitor Examples 
	========================

Monitors were discussed last week and this week we are going to look at a couple of more examples. Wait and signal are used in process synchronization. When you enter a monitor procedure you either implicitly or explicitly get a lock. So monitors are higher level synchronization primitive than semaphores. They are simple and easy to use. 
Monitors combine the data and the methods to operate on the data. 

We have a producer and consumer problem from the Hoare paper. This is the bounded buffer problem. We have two conditions: non-empty and non-full. The append procedure –if you enter append when the count equals the size of the buffer then the buffer is full and we have to wait until the buffer is not full. Assume either the buffer is not full so you set your pointer to the last element of the buffer and signal the procedure non-empty, so anyone waiting for the buffer can use it. Signal lets anyone know that he is done appending. 

Remove procedure is like append but in reverse. You can only remove if there is an item to be removed from the buffer. So if the counter is 0 then the buffer is empty and you have to wait. You wait on the condition non-empty. Then you pull an item from the buffer, you decrement the count and you signal to any thread that the buffer is no longer full. And you initialize the counter to 0. In this example we assume that locks are implicitly acquired when you entered the procedure either append or remove and locks are implicitly removed when you leave the procedure. 

Let’s look at a slightly more complicated example - the disk head scheduler from the Hoare paper. We have 2 methods: request and release. Request is called before issuing a command to move the disk head to the destination cylinder. We have the head position which is the position of the head. We have variable sweep which tells you the direction of the head: up or down. We want to schedule the disk arm so that it moves minimum distances. If you just use first come first serve than your arm will be sweeping all over the disk. What we want is to move minimal distance. But if you only service the requests that are closest to where you currently are then this introduces the problem if starvation because the cylinders at the beginning and the end of the disk will never get serviced. This uses the elevator algorithm. (Look at code in the paper). We keep track of head position: up or down and we keep track of whether the disk head is currently busy. We have 2 conditions upcondition a!
 nd downcondition. We wait on the upcondition if we were waiting to be woken up on the next upsweep of the arm. We are waiting on the downsweep condition if the arm is currently moving down and our position is down with respect to the arm.  Procedure request- the parameter you pass is the cylinder you want to be on. If you call request and the disk is currently busy you have to wait. 

Cylinder
numbers
	1. -------------------
	2. -------------------
	3. -------------------
	4. -------------------
	5. -------------------
	6. -------------------
	7. -------------------


You have request 4 and head position is currently 2 and it is sweeping down and it is busy then if busy we have to do some checking. If the hed position is less tha the destionation we want. so in this case it is and we wait for the upsweep. If the head position is equal to the position where we want to read we wait for the upstream. This pairs it with the request position. When we release we set busy to false since we are done reading. 
     

	Topic: Introduction to Storage Allocation; Linkers
	==================================================

Object code in system such as Unix  divided  into  three parts which are Code ("text" in Unix terminology), Data and Stack.
We distinguish between different segments of a program because we will sometimes want to have two or more processes running  at  the  same  time run the same code- i.e.share the code, which is possible since  code  isn't    modified. Data and stack segments must apply to each  process, since those are modified.
Code is generated by the compiler from source code. This code contains addresses, some of which have values not known at  compilation or assembly time. Address of base of text  segment  not  known  at compile tim. Address of base of code not  known  at  assembly time.

Addresses are not known because we want the ability to combine code  that  was  compiled  at different times.  If we compile everything at the same time, we don't have a problem.

The compiler generates one object file for each  source code file containing information for that file.  Information is incomplete, since each source file generally  references  some  things  defined  in  other source files. 
Compiler provides the symbol table  and  relocation table.

The linker or Linkage-Editor combines all of the object files  for one or more programs into a single object file, which is complete and self-sufficient.   (linker can go from many object files to just one.)

The loader takes a single object file  and  adjusts  all addresses to reflect correct load address for object file. Linker often includes function of loader.


The Operating system places object files into  memory,allows  several  different processes to share memory at once, provides facilities for  processes  to  get more memory after they've started running.

The run-time library works together with OS to provide dynamic allocation routines.

There are absolute addresses within a code  segment.The  values of these addresses need to be determined.When code is compiled or  assembled,  these  addresses  are usually set as if segment is loaded at zero.

There are relative addresses (e.g. JMP  * 28)  -  no problem.
There are external addresses -  e.g.  CALL  SUBPROG. You don't resolve these until you have your compilation.
There are addresses of data in the data segment.

When Compiling we create

    1. Segment Table - for each segment, we need  the  segment  name,  segment  size,  and the base address at which it was assumed to be loaded.

    2.  Symbol table - contains  global  definitions  -  has
        table  of  labels that are needed in other segments.
        Usually, these  have  been  declared  in  some  way;
        internal labels are not externally visible.  (Inter-
        nal labels were known by the compiler anyway.)
    3.   Relocation Table - table of  addresses  within  this
        segment that need to be fixed, i.e. relocated.  Con-
        tains internals -  references  to  locations  within
        this  segment,  and external references - references
        that are believed to be external. 
The compiler provides these tables along with the object code for each segment.

Effectively, there are 3 steps in a Linker/Loader:
    1.Determine location of each segment
    2.Calculate values of symbols and update symbol table
    3.Scan relocation table and relocate addresses.

 Operation of a Linker in more detail
    1.Collect all the pieces of a program -this may involve finding some segments from  the file system and libraries.
    2. Assign each segment a  final  location.   Build  the segment table.
    3. Resolve (i.e. fix) all of the addresses that can  be
        fixed.   Result is a new object file.  All addresses
        may be resolved, or there may be a new  object  file
        with some addresses still unresolved - this is done by:
	a) taking the symbol table for each segment and assigning to each symbol its new address.
	b) Relocation table is scanned, and for every entry
	   in the code, value is replaced with new absolute
	   value.


                Topic: Dynamic Storage Allocation
		=================================


Static allocation is not sufficient because:
   Unpredictability:   can't predict ahead of time how much
    memory, or in what form, will be needed. Examples:
	a) Recursive procedures.  Even regular  procedures  are
        hard to predict (data dependencies).
    	b)Complex data structures, e.g. linker  symbol  table.
        If  all  storage must be reserved in advance (stati-
        cally), then it will be used  inefficiently  (enough
        will be reserved to handle the worst possible case).

There are two basic operations in dynamic storage management- Allocate and Free.

Dynamic allocation can be handled in one of two  general ways:
  	1.Stack allocation  (hierarchical):   restricted,  but
        simple and efficient.
 	2."Heap" allocation:  more  general,  but  less  effi-
        cient, more difficult to implement.  (I.e. uses free
        storage area.)

   Stack organization: Memory allocation  and  freeing  are
    partially  predictable  (as  usual, we do better when we
    can predict the future).   Allocation  is  hierarchical:
    memory  is  freed in opposite order from allocation.  

   Heap   organization:    allocation   and   release   are
    unpredictable.  Heaps are used for arbitrary list struc-
    tures, complex data  organizations.   Example:   payroll
    system.   Don't  know when employees will join and leave
    the company, must be able to keep track of all them  us-
    ing the least possible amount of storage.

Memory consists of allocated areas  and  free  areas
        (or  holes).   Inevitably end up with lots of holes.
        Goal:  reuse the space in holes to keep  the  number
        of holes small, their size large.
    
Fragmentation:  inefficient use  of  memory  due  to
        holes that are too small to be useful.  In stack al-
        location, all the holes  are  together  in  one  big
        chunk. Fragmentation can be:
        Internal - space is wasted within blocks
        External - space wasted between blocks.

	Typically, heap allocation schemes use a  free  list
        to  keep  track  of  the storage that is not in use.
        
Algorithms differ in how they manage the free list.
    	Best fit:  keep linked list of free  blocks,  search
        the whole list on each allocation, choose block that
        comes closest to matching the needs of  the  alloca-
        tion,  save  the  excess  for later.  During release
        operations, merge adjacent free blocks.

    	First fit:  just scan list for the first  hole  that
	is  large  enough.   Free  excess.   Also  merge  on
        releases.  Most first fit implementations are rotat-
        ing first fit. (Next fit.)

    	Next Fit: like first fit, but start where  you  left
        off.

    	Best fit is not necessarily better than  first  fit.
        Suppose memory contains 2 free blocks of size 20 and
        15.

        Suppose allocation ops are 10  then  20:   which
        approach wins? - Best Fit, first allocates 10->15 and 
	then 20->20. First Fit could have allocated 10->20 and
	then have no place to put 20
        Suppose ops are 8, 12, then 12:  which one wins?-First Fit

        First fit tends to leave ``average'' size holes,
        while  best  fit  tends to leave some very large
        ones, some very small ones.  The very small ones
        can't be used very easily.

    Bit Map:  used for allocation of storage that  comes  in
    fixed-size chunks (e.g. disk blocks, or 32-byte chunks).
    Keep a large array of bits, one for each chunk.  If  bit
    is  0  it  means  chunk  is in use, if bit is 1 it means
    chunk is free.  Will  be  discussed  more  when  talking
    about file systems.

    Pools:  keep a separate allocation pool for each popular
    size.  Allocation is fast, no fragmentation.

   Reclamation Methods:  how do we know  when  dynamically-
    allocated memory can be freed?

   	It's easy when a chunk is only used in one place.				
        Reclamation is hard when information is shared:   it
        can't  be recycled until all of the sharers are fin-
        ished.  Sharing is  indicated  by  the  presence  of
        pointers  to the data.  Without a pointer, can't ac-
        cess (can't find it).

   Two problems in reclamation:
    1. Dangling pointers:  better not recycle storage while
        it's still being used.
    2.  Core leaks: Better not ``lose'' storage  by  forget-
        ting  to  free  it  even  when it can't ever be used
        again.

    Reference Counts:  keep track of the number of outstand-
    ing pointers to each chunk of memory.  When this goes to
    zero,  free  the  memory.   Example:   Smalltalk,   file
    descriptors in Unix.  Works fine for hierarchical struc-
    tures.  The reference counts must be  managed  carefully
    (by  the system) so no mistakes are made in incrementing
    and decrementing them.

    
    What happens when there are circular structures?
     -    -
    | |->| |
    | |<-| |
     -	  -
	
    The memory can never be freed since there is alwys reference 
    to it.

   Garbage Collection:  storage isn't freed explicitly (us-
    ing free operation), but rather implicitly:  just delete
    pointers.  When the system needs  storage,  it  searches
    through  all  of the pointers (must be able to find them
    all!) and collects things that  aren't  used.   (Marking
    Algorithms)  If structures are circular then this is the
    only way to reclaim space.  Makes life easier on the ap-
    plication programmer, but garbage collectors are incred-
    ibly difficult to program and debug, especially if  com-
    paction  is also done.  Examples:  Lisp, capability sys-
    tems.

How does garbage collection work? - Storeage is released implisitly
    	Must be able to find all objects.
	Must be able to find all pointers to objects.

  Mark and sweep
	Pass 1:  mark.  Go through all  statically-allocated
        and procedure-local variables, looking for pointers.
        Mark each object pointed to,  and  recursively  mark
        all  objects  it  points  to.   The  compiler has to
        cooperate by  saving  information  about  where  the
        pointers are within structures.

	Pass 2:  sweep.  Go through  all  objects,  free  up
        those that aren't marked.

    Garbage collection is often expensive:  20% or  more  of
    all CPU time in systems that use it.

   Buddy System
	Memory is divided in size chucks which are powers of 2. 
	If a small is needed memory is divided until we get a power of 2 chunk 	which can hold the data. Once memory is freed this 
	chunk is merged back with its buddy chunk since every splitting of memory will cause for a chunk to have a buddy.
        
  	__________________
	| | |   |        |
        ------------------       
         ^ ^
	 | |   
	buddies


	Topic: Sharing Main Memory -- Segmentation and Paging
	======================================================

 	How do we allocate memory to processes?

       1. Simple uniprogramming with a single segment per process:
           One program in memory at a time.  (Can actually multipro-
             gram by swapping programs.)
             Highest memory holds OS.
             Process is allocated memory starting at 0 (or J),  up
             to (or from) the OS area.
             Process always loaded at 0.
            
         Advantages
             1. Low overhead
             2.Simple
             3.No need to do relocation.  Always loaded at zero.

         Disadvantages
             1.No protection - process can overwrite OS, which means it can get complete control of system
             2.Multiprogramming requires swapping entire process  in
                 and out
            	a) Overhead for swapping
		b) Idle time while swapping

             3.Process limited to size of memory
	     4.No good way share - only one process at a time (can't
                 even  overlap  CPU and I/O, since only one process in
                 memory.)

       2. Relocation - load program anywhere in memory.
             Idea is to use loader or linker to load program at an ar-
             bitrary memory address. Note that program can't be moved (relocated) once loaded.
	This scheme (#2) is essentially the same as #1,  but  the ability to load at any address will be used in #3 below.

       3. Simple multiprogramming with static  software  relocation,
         no protection, one segment per process:
            Highest or lowest memory holds OS.
            Processes allocated memory starting at 0 (or  N),  up  to
             the OS area.
            When a process is initially loaded, link it  so  that  it
             can run in its allocated memory area
            Can have several programs in memory at once, each  loaded
             at a different (non overlapping) address.
          Advantages:
                1. Allows multiprogramming without swapping processes in and out.
                2.Makes better use of memory
                3.Higher CPU use due to  more  efficient  multiprogram-
                 ming.

         Disadvantages
                 No protection - jobs can read or write others.
                 External fragmentation
                 Overhead for variable size memory allocation.
                 Still limited to size of physical memory.
                 Hard to increase amount of memory allocation.
                 Programs are staticly loaded - are tied to fixed  lo-
                 cations  in  memory.  Can't be moved or expanded.  If
                 swapped out, must be swapped to same location.

         4. Dynamic memory relocation:  instead of  changing  the  ad-

         dresses  of  a program before it's loaded, change the address

         dynamically during every reference.		
             Figure of a processor and a memory box, with a memory re-
             location box in between.

	     ---    ---    ---
            |CPU|->|MMU|->|MEM|
	     ---    ---    ---		

             There are many types of relocation - to be discussed.
             Under dynamic relocation, each program-generated  address
             (called  a  logical  or virtual address) is translated in
             hardware to a physical, or real address.  This happens as
             part of each memory reference.

             Virtual (logical) Address is what  the  program  generates.
             Virtual address space is set of  (legal)  virtual addresses the 	             
	     program can generate.

             Physical (real) addresses - set of addresses in  phy-
             sical memory.Physical address space of program - set of 	     	    
	    physical addresses it can get to. Physical address space of 		     
	  machine - set  of  addresses in physical memory.

             Dynamic relocation leads to two views of  memory,  called
             address  spaces.    We have the virtual address space and
             the real address space.  Each process has its own virtual
             address space.  With static relocation we force the views
             to coincide. In some systems, there are several levels of
             mapping.

         Several types of dynamic relocation:

         Base & bounds relocation:
             Two  hardware  registers:   base  register  for  process,
             bounds register that indicates the last valid address the
             process may generate.
                 Real address = base   virtual address
                     IF virtual_address < bounds, and VA>= 0
                 In parallel, the real address is generated by  adding
                 it to the base register.

             On each memory reference, the virtual address is compared

             Advantages:
                 Each process appears to  have  a  completely  private
                 memory of size equal to the bounds register plus 1.
                 Processes are protected from each other.
                 No address relocation is necessary when a process  is
                 loaded.
                 Task switching  is  very  cheap,  when  done  between
                 processes in memory- just reload processor registers.
                     Higher overhead to load process from disk.
                 Compaction is possible.
             Disadvantages:
                 Still limited to size of main memory.
                 External fragmentation (between processes)
                 Overhead  for  allocating  variable  size  spaces  in
                 memory.
                 Sharing difficult - only possible if bases  &  bounds
                 overlap.
                 Only one "segment" - i.e. one region of memory.
                 New, special hardware needed for relocation.
                 Time to do relocation (it isn't free).

             OS must be able to change value of  relocation  registers
                 OS loads new process and sets base and bounds  regis-
                 ters.
                 OS schedules process, and sets base and bounds regis-
                 ter.
                 When tasks are switched, must be able to  swap  base,
                 bounds and PC registers simultaneously.
                 These imply that OS must run with base and bounds re-
                 location  turned off - otherwise, would affect itself
                 when running.  (Or would need its own set of base and
                 bounds registers.)
                 Use of base and bounds controlled by status bit, usu-
                 ally in PSW or SSW, or similar control register.

             Users must not be able  to  change  values  of  base  and
             bounds registers
                 Otherwise, no protection between  users.   Can  trash
                 others or OS.

            Problem: how does OS regain control once it has given  it
             up?
                 OS is entered on trap (including SVC) or interrupt.
                 When OS is entered, use of base and  bounds  must  be
                 disabled.  (I.e.  bit in PSW is reset.)
                 Typically, trap handler loads  new  control  register
                 values.

             Base & bounds is cheap -- only 2 registers -- and fast --
             the add and compare can be done in parallel.
          
         Can consider three types of systems using base and bounds re-
         gisters:
            1. Uniprogramming - single user region.  Bring  a  user  in,
             and run him.
            2. Multiprogramming with Fixed Partitions - (OS/MFT) -  par-
             tition  memory  into  fixed  regions  (may  be  different
             sizes).  User goes into region of given size.
	    3.Multiprogramming with Variable Partitions (OS/MVT) - par-
             titions are dynamically variable.

         Note that we can do any of the three  above  schemes  without
         base and bounds registers - just load programs into region at
         appropriate base address.

         Task Switching
             We can now switch between processes very cheaply -  don't
             have  to  reload  memory, just change contents of process
             control block (which now has values of  base  and  bounds
             registers).
             We can also run processes which are not in memory
                 Find empty area of memory in which to place process 
                     Remove one or  more  processes  from  memory,  if
                     necessary, in order to find space.
                     (I.e. copy the  removed  processes  to  space  on
                     disk.)
                 Copy new process (from disk) into memory.

             If only one process fits in memory, have to wait for swap
             to take place.
                 If several processes fit  in  memory,  can  swap  one
                 while executing the other.

         5. Multiple segments - Segmentation.
             Divide virtual address space into several "segments".
                 This is not the same as the "segments" of linkers and
                 loaders.
             Use a separate base and bound for each segment, and  also
             add  protection  bits  (read,  write, execute), and valid
             bit.  (Also will want dirty bit.)
         Each address is now <segment #, byte in segment>
             Each memory reference indicates a segment and  offset  in
             one or more of three ways:
                 Top bits of address  select  segment,  low  bits  the
                 offset.  This is the most common, and the best.
                 Or, segment is selected implicitly by the instruction
                 (e.g.  code  vs. data, stack vs. data, which base re-
                 gister is used, or 8086 prefixes).
                 Instruction specifies directly or indirectly  a  base
                 register for the segment.
                 Subprograms  (procedures,  functions,  etc.)  can  be
                 separate segments.

         Segments typically are associated with logical partitions  of
         your process address space - e.g. code, data stack.  Or, each
         module or procedure can be a separate segment.
	     Need either segment table or segment  registers  to  hold
             the base and bounds for each segment.
                 picture of segment table, with segment table en-
                 tries.

	   	------------------------
		| base | bound | flags |		
		------------------------
	       0|      |       |       |
		------------------------
	       1|      |       |       |
		------------------------	

             Memory mapping procedure consists of table lookup   add  
             compare.


         Address translation for segmentation
             Have segment table - maps segment number to [Segment base
             address,  segment  length (limit), protection bits, valid
             bit, reference bit, dirty bit]
                 This info is in Segment Table Entry (STE)
                 Diagram of segment table.
                 Segment descriptor
             Need some hardware to automatically map virtual  (segment
             number, word number) to real address.
         Real address = segment_table(segment #)   word number.
             Invalid if word_number > limit.  (Note that  we  do  test
             without adding bound to both)
                 Also valid bit must be on, and permission  bits  must
                 permit access.
             Need more hardware to make it go fast (discuss later)
             Have Segment Table Base Register (STBR) point to base  of
             segment table (for hardware to use)
        

	Alternate approach - if there are a small number of segments,
	can have segment registers - one register per segment.

         Can also multiplex a small number of segment registers  among
         a large number of segments (as with X86 architecture)

         Advantages
             Each process has own virtual address space
             Protection between address spaces
             Separate protection between segments (R/W/E)
             Virtual space can be larger than physical memory
             Unused segments don't need to be loaded Can load segments
             as needed.
                 Attempt to reference missing segment  called  segment
                 fault.
             Can share one or more segments
                 sharing is tricky - we'll talk about this later
             Segments can be placed anywhere in memory that they fit.
             Memory compaction easy.
             Segment sizes can be changed independently.

         Disadvantages
             Each segment must be allocated contiguously.
             Segment size < memory size
             External fragmentation
             Overhead of allocating memory
             Need hardware for address translation
             Overhead (time/hardware) of doing address translation
             More complicated.
             Space for segment table.

         Note that segment tables are usually 1-1 with  processes.   A
         segment table defines a process's address space.
                 Protection is a problem
                 Have same problem as before - now we have to allocate
                 shared virtual instead of shared physical memory.
             When we switch processes, we  reload  the  STBR  (segment
             table base register), which changes address space.