CS 164 Lecture notes for 14 February 2005
by Keith Rarick, based on notes posted by Alan Smith

Topic: Linkers and Loaders

In more detail: Operation of a Linker:

  - Collect all the pieces of a program,

      - This may involve finding some segments from the file system and
        libraries.

        If a program calls a function that is not defined in the current file,
        it must be defined in some other file.

  - Assign each segment a final location. Build the segment table.

    This is just a list of segments and their addresses. The final locations
    are not know until all code is brought together.

  - Resolve (i.e. fix) all of the addresses that can be fixed. Result is a new
    object file. All addresses may be resolved, or there may be a new object
    file with some addresses still unresolved.

      - This is done by taking the symbol table for each segment and assigning
        to each symbol its new address.

          - If the linker has been given an absolute address for loading, then
            absolute address is calculated.

          - If linker is to produce new relocatable object module, then
            absolute address relative to zero is calculated, and new symbol
            table is output.

      - Relocation table is scanned, and for every entry in the code, value is
        replaced with new absolute value.

        The relocation table is a list of all places in the code where an
        address exists, when that address depends on the ultimate location of
        the code itself. Thus, if the code is moved to a different base
        location, these addresses need to be changed accordingly.

          - If the linker has been given an absolute address for loading, then
            absolute address is calculated.

          - If linker is to produce new relocatable object module, then
            absolute address relative to zero is calculated, and new relocation
            table is output.

Topic: Dynamic Storage Allocation

  - Dynamic Storage Allocation was covered in previous courses.

  - Why isn't static allocation sufficient for everything? Unpredictability:
    can't predict ahead of time how much memory, or in what form, will be
    needed:

      - Recursive procedures.  Even regular  procedures  are
        hard to predict (data dependencies).

        A recursive procedure may call itself an arbitrary number of times.
        This number cannot, in principle, be determined statically. (This is
        equivalent to the halting problem.)

        Non-recursive procedures may call other non-recursive procedures in
        different ways depending on input the program gets at runtime (aka data
        dependencies). Here it is possible to calculate an upper bound on
        memory required, but it is tricky and possibly wasteful:

      - Complex data structures, e.g. linker symbol table. If all storage must
        be reserved in advance (statically), then it will be used inefficiently
        (enough will be reserved to handle the worst possible case).

      - For example, OS doesn't know how many jobs there will be or which
        programs will be run.

  - Need dynamic memory allocation both for main memory and for file space on
    disk.

    The same reasoning applies to both cases.

  - Two basic operations in dynamic storage management:

      - Allocate

      - Free

    These are pretty self-explanatory.

  - Dynamic allocation can be handled in one of two  general ways:

      - Stack allocation (hierarchical): restricted, but simple and efficient.

      - "Heap" allocation: more general, but less efficient, more difficult to
        implement. (I.e. uses free storage area.)

  - Stack organization: Memory allocation and freeing are partially predictable
    (as usual, we do better when we can predict the future). Allocation is
    hierarchical: memory is freed in opposite order from allocation. If
    alloc(A) then alloc(B) then alloc(C), then it must be free(C) then free(B)
    then free(A).

    Aka a LIFO structure.

      - Example: procedure call. X calls Y calls Y again. Space for local
        variables and return addresses is allocated on a stack.

        For the primary stack used by the program to organize procedure calls,
        all objects are allocated contiguously in memory. When we need to
        allocate a new object, we always know where the free space is -- right
        after the last allocated object. (See below.)

      - Stacks are also useful for lots of other things: tree traversal,
        expression evaluation, top-down recursive descent parsers, etc.

        As is well-known from 61B.

    A stack-based organization keeps all the free space together in one place.

    Practically speaking, then, the only bookkeeping we need to allocate memory
    is a pointer to the free space. Every thing on one side of this pointer is
    allocated and everything on the other is free. To allocate more memory the
    program just modifies this pointer. To free an object, the program must
    remember the size of the last allocated object (that hasn't already been
    freed). Then it can simply re-adjust the pointer.

  - Heap organization: allocation and release are unpredictable. Heaps are used
    for arbitrary list structures, complex data organizations. Example: payroll
    system. Don't know when employees will join and leave the company, must be
    able to keep track of all them using the least possible amount of storage.

      - Memory consists of allocated areas and free areas (or holes).
        Inevitably end up with lots of holes. Goal: reuse the space in holes to
        keep the number of holes small, their size large.

        Why do we end up with holes? Example: we allocate ten 100-byte arrays,
        then free the fifth and seventh. Now we have holes. If we only ever
        freed things from the end, we could make do with a stack.

        We want to keep the number of holes small to make it faster to find
        free space when we need it. Every time we want to allocate memory, we
        have to search through a list of free areas. If the list is smaller,
        then the search will be faster.

        We want to keep the size of holes large to make it more likely that any
        particular hole will be sufficient to serve an allocation request.

      - Fragmentation: inefficient use of memory due to holes that are too
        small to be useful. In stack allocation, all the holes are together in
        one big chunk.

        Thus, stack allocation is the ideal limit case for this aspect of
        memory allocation.

          - Internal - space is wasted within blocks

            You have asked for (and only intend to use) n bytes, but the system
            has reserved m bytes where m > n. This can happen if the system is
            only able to allocate memory in multiples of a fixed number, k, and
            n is not a multiple of k.

          - External - space wasted between blocks.

            This happens when unallocated regions are too small. E.g. the
            program wants 8KB, but there are lots of 1KB holes.

      - Typically, heap allocation schemes use a free list to keep track of
        the storage that is not in use. Algorithms differ in how they manage
        the free list.

      - Best fit: keep linked list of free blocks, search the whole list on
        each allocation, choose block that comes closest to matching the needs
        of the allocation, save the excess for later. During release
        operations, merge adjacent free blocks.

      - First fit: just scan list for the first hole that is large enough.
        Free excess. Also merge on releases. Most first fit implementations are
        rotating first fit (aka Next fit).

      - Next Fit: like first fit, but start where you left off.

      - Best fit is not necessarily better than  first  fit.  Suppose memory
        contains 2 free blocks of size 20 and 15.

          - Suppose allocation ops are 10  then  20:   which approach wins?

            In first fit, we allocate 10 from the first block, leaving two free
            blocks of size 10 and 15. Then the second allocation fails.

            In best fit, we allocate 10 from the second (smaller) block,
            leaving two free blocks of size 20 and 5. The second allocation
            happens from the first block.

            Best fit wins.

          - Suppose ops are 8, 12, then 12:  which one wins?

            In first fit, we allocate 8 from the first block, leaving sizes 12
            and 15. Then allocate 12 from the first block, leaving only one
            block of size 15. Finally we can allocate 12 from the remaining
            bloc.

            In best fit, we allocate 8 from the second block, leaving sizes 20
            and 7. Then we allocate 12 from the first block, leaving sizes 8
            and 7. The third allocation fails.

            First fit wins.

          - First fit tends to leave ``average'' size holes, while best fit
            tends to leave some very large ones, some very small ones. The
            very small ones can't be used very easily.

              - Knuth claims that if storage is close to running out, it will
                run out regardless of which scheme is used, so pick easiest or
                most efficient scheme (first fit).

                The industry agrees. Most actual OSes do not use best fit.

  - Bit Map: used for allocation of storage that comes in fixed-size chunks
    (e.g. disk blocks, or 32-byte chunks). Keep a large array of bits, one for
    each chunk. If bit is 0 it means chunk is in use, if bit is 1 it means
    chunk is free. Will be discussed more when talking about file systems.

  - Pools: keep a separate allocation pool for each popular size. Allocation is
    fast, no fragmentation.

    This only works if there are a small number of popular sizes.

  - Reclamation Methods: how do we know when dynamically-allocated memory can
    be freed?

      - It's easy when a chunk is only used in one place.

        In that case, you allocate in the spot it is used, and deallocate it
        right away when that place is done.

      - Reclamation is hard when information is shared: it can't be recycled
        until all of the sharers are finished. Sharing is indicated by the
        presence of pointers to the data. Without a pointer, can't access
        (can't find it).

        In other words, this is a way you can decide if a piece of data is
        still "in use". It is in use if anything still has a pointer to it. If
        nothing points to the data, it is inaccessible and hence no longer in
        use.

  - Two problems in reclamation:

      - Dangling pointers:  better not recycle storage while it's still being
        used.

        If a piece of memory is freed while a pointer still exists, anything
        could happen if code later tries to use that pointer. The program could
        crash, give incorrect results, or even continue to work for some time
        if the memory in that area has not been cleared or reused.

      - Core leaks: Better not ``lose'' storage  by  forgetting  to  free  it
        even  when it can't ever be used again.

        For long-running programs this can cause the process to run out of
        memory. In effect, the amount of total memory decreases every time an
        object becomes inaccessible but unfreed. This is because that memory
        region is neither available nor in use.

  - Reference Counts:  keep track of the number of outstand- ing pointers to
    each chunk of memory.  When this goes to zero,  free  the  memory.
    Example: Smalltalk,   file descriptors in Unix.  Works fine for
    hierarchical struc- tures.  The reference counts must be  managed
    carefully (by  the system) so no mistakes are made in incrementing and
    decrementing them.

      - What happens when there are circular structures?

        This is the fatal problem with reference counting. It won't free a
        circular structure. If initially A -> B, B -> C and C -> B, B's
        reference count is 2 and C's is 1. Then if A's pointer is cleared, both
        B and C are unreachable but their reference counts are both 1. Their
        counts will never reach zero and the won't be freed.

  - Garbage Collection:  storage isn't freed explicitly (using free
    operation), but rather implicitly:  just delete pointers.  When the system
    needs  storage,  it  searches through  all  of the pointers (must be able
    to find them all!) and collects things that  aren't  used.   (Marking
    Algorithms)  If structures are circular then this is the only way to
    reclaim space.  Makes life easier on the application programmer, but
    garbage collectors are incredibly difficult to program and debug,
    especially if  compaction  is also done.  Examples:  Lisp, capability
    systems.

    Many consider reference counting to be a form of garbage collection. What
    is called "garbage collection" here is mark-and-sweep garbage collection.

  - How does garbage collection work?

      - Must be able to find all objects.

      - Must be able to find all pointers to objects.

      - Pass 1:  mark.  Go through all  statically-allocated and
        procedure-local variables, looking for pointers.  Mark each object
        pointed to,  and  recursively  mark all  objects  it  points  to.   The
        compiler has to cooperate by  saving  information  about  where  the
        pointers are within structures.

      - Pass 2:  sweep.  Go through  all  objects,  free  up those that aren't
        marked.

  - Garbage collection is often expensive:  20% or  more  of all CPU time in
    systems that use it.

  - Buddy System

Topic: Sharing Main Memory -- Segmentation and Paging

       - How do we allocate memory to processes?

       - 1. Simple uniprogramming with a single segment per process:

           - One program in memory at a time.  (Can actually multiprogram by
             swapping programs.)

               - Highest memory holds OS.

                 The OS lives in the region of memory with the greatest
                 addresses.

               - Process is allocated memory starting at 0 (or J),  up to (or
                 from) the OS area.

                 J is the beginnig of the OS region.

               - Process always loaded at 0.

               - Examples:  early batch monitors where  only  one  job ran  at
                 a time and all it could do was wreck the OS, which would be
                 rebooted  by  an  operator.   Many  of today's  personal
                 computers also operate in a similar fashion.

                 This remark is slightly out-of-date. Most PCs today have more
                 modern virtual memory systems. The remark refers to DOS
                 systems.

           - Advantages
               - Low overhead
               - Simple
               - No need to do relocation.  Always loaded at zero.

           - Disadvantages

               - No protection - process can overwrite OS
                   - which means it can get complete control of system

               - Multiprogramming requires swapping entire process  in and out
                   - Overhead for swapping
                   - Idle time while swapping

               - Process limited to size of memory
                   - CTSS ("compatible" time sharing system), and  how system
                     swapped users completely.

               - No good way share - only one process at a time (can't even
                 overlap  CPU and I/O, since only one process in memory.)

       - 2. Relocation - load program anywhere in memory.

           - Idea is to use loader or linker to load program at an ar-
             bitrary memory address.

           - Note that program can't be moved (relocated) once loaded.
             (WHY??)

             Two reasons. First, the addresses embedded within the program have
             already been rewritten to their final values and the relocation
             table has been thrown away. So there is no way to find out what
             needs to be updated.

             Also, a process that is executing in that code would not function
             properly if the code were suddenly moved.

           - This scheme (#2) is essentially the same as #1,  but  the
             ability to load at any address will be used in #3 below.

       - 3. Simple multiprogramming with static  software  relocation, no
            protection, one segment per process:

           - Highest or lowest memory holds OS.

           - Processes allocated memory starting at 0 (or  N),  up  to the OS
             area.

           - When a process is initially loaded, link it  so  that  it can run
             in its allocated memory area

           - Can have several programs in memory at once, each  loaded at a
             different (non overlapping) address.

           - Advantages:
               - Allows multiprogramming without swapping processes in and out.
               - Makes better use of memory
               - Higher CPU use due to  more  efficient  multiprogram- ming.

           - Disadvantages
               - No protection - jobs can read or write others.
               - External fragmentation
               - Overhead for variable size memory allocation.
               - Still limited to size of physical memory.
               - Hard to increase amount of memory allocation.
               - Programs are staticly loaded - are tied to fixed  lo- cations
                 in  memory.  Can't be moved or expanded.  If swapped out, must
                 be swapped to same location.