CS162 Lecture, Monday 02/14/05
                   By Tom Gaston

               *********************

ANNOUNCEMENTS
=============

It’s almost midterm time.  The first one is on Monday 2/28.
When it says “tentative,” that means it will definitely be on
that day unless something unexpected makes that impossible –
so be prepared for the test to be on that day.


               *********************


LINKERS AND LOADERS
===================

As we learned in the previous lecture, it is the job of the
loader to adjust all the addresses in the code to reflect
the correct load address of the object file.  It does this
by creating several tables.  These tables are ...  

- Segment Table: for each segment, keep track of its name, size
  and the base address at which it was assumed to be loaded.

- Symbol Table: contains global definitions, such as labels and
  global variables.  The table keeps track of the symbol, the
  segment it is in, and its offset from the base of the segment.

- Relocation Table: keeps track of the addresses within this
  segment that need to be adjusted.  Specifically, it stores the
  address location, the symbol name, the offset from the symbol,
  and the length of the address field.

               *********************

Summary of the linker/loader process:

It starts with the individual tables for each segment, which were
created by the compiler.  The first thing the loader does is to
determine the locations of each segment.  Next it calculates the
addresses of the symbols.  This is done by noticing the new offset
of the segment containing the symbol, and adding it to the offset
of this symbol (which would be the correct location if the segment
was loaded at address 0).  Finally, it scans the relocation tables
and updates the addresses with the newly calculated symbol locations.

               *********************

Let’s see an example of this in action.

Start with three segments: A, B, and C ...


  A                             C
  =======  |   |            |   =======    |
  ...      |R1 |         Ox |   ...        |
  call x   |   |R2          |   label x:   |R3
  ...          |                ...        |
  load y       |                load y:    |


            B
   |    |   =======   |     |
   |    |   ...       |R4   |
   |    |   jump z+5  |     |R5
   | Oy |   ...             |
   |    |   call x          |
Oz |    |   ...
   |    |   label y:
   |        ...
   |        label z:


  _____________________________________________________
 |  KEY:                                               |
 |  Ox = the offset of x in C                          |
 |  Oy = the offset of y in B                          |
 |  Oz = the offset of z in B                          |
 |  R1 = the location of a particular address          |
 |       in A that needs to be relocated               |
 |  R2,R3,.. = ... etc ...                             |
 |  lA, lB, lC = the lengths of segments A, B, and C   |
  -----------------------------------------------------


The compiler gives us these tables:


SEGMENT TABLE:
name / size / nominal base
--------------------------
A    | lA   | 0
B    | lB   | 0
C    | lC   | 0


SYMBOL TABLE:
symbol / segment name / offset from base of segment
---------------------------------------------------
x      | C            | Ox
y      | B            | Oy
z      | B            | Oz


RELOCATION TABLE:
address location / symbol / offset from symbol / length of address
------------------------------------------------------------------
A’s R1           | x      | 0                   | 4
A’s R2           | y      | 0                   | 4
C’s R3           | y      | 0                   | 4
B’s R4           | z      | 5                   | 4
B’s R5           | x      | 0                   | 4


The first thing it does is lay the segments out in memory.
Say it chooses to put A first, B next, and C last.

Now it updates the segment table to reflect this:


SEGMENT TABLE:
name / size / base
--------------------------
A    | lA   | 0
B    | lB   | lA
C    | lC   | lA + lB


Next it updates the symbol table to reflect the fact that
not all the segments at based at address 0.


SYMBOL TABLE:
symbol / segment name / location
---------------------------------------------------
x      | C            | lA + lB + Ox
y      | B            | lA + Oy
z      | B            | lA + Oz


Finally it updates address values in the instructions
indicated in the relocation table.


instruction / new address it uses
---------------------------------
A’s R1      | lA + lB + Ox
A’s R2      | lA + Oy
C’s R3      | lA + Oy
B’s R4      | lA + Oz
B’s R5      | lA + lB + Ox


               *********************

For the midterm you should be able to be given some segments
and create these tables.  This material is not in the book,
but it is in the reader, in gory detail.

               *********************


DYNAMIC STORAGE ALLOCATION
==========================

Some things cannot be allocated statically.  This is because we
can’t predict exactly how much memory will be needed.  Recursive
procedures, for instance, could potentially loop forever, using 
more and more memory.  There is no way to predict if this will 
happen.  If we tried to prepare for the worst case, we could end
up allocating way too much memory for the normal case, which would
be wasteful.

There are two basic operations in dynamic storage management:
Allocate and Free

Dynamic allocation can be handled in two general ways:
Stack allocation or Heap allocation

               *********************

STACK ALLOCATION:

In stack allocation, memory is freed in the opposite order of
allocation (like the stack data structure).  If you alloc(A), 
alloc(B), then alloc(C) you must free(C), free(B), then free(A).
Stack allocation keeps all the data in the same place and therefore
all the free space in the same place.  This prevents fragmentation.

               *********************

HEAP ALLOCATION:

Heap allocation is more complicated, but less restrictive.
Allocation and release are unpredictable.  Heaps are used for
arbitrary list structures and complex data organizations.
Memory consists of allocated areas separated by “holes”, or free
areas.

A problem with heap allocation is that that memory can become
fragmented.  This means there are lots of little holes that are
too small to allocate for any purpose, thus that memory is wasted.
“Internal” fragmentation refers to memory wasted within a block.
This occurs when you allocate a bigger block than you needed.
“External” fragmentation refers to memory wasted between blocks.

Typically, heap allocation schemes use a free list to keep track
of storage that is not in use.  There are various algorithms for
managing the free list:

- Best fit: scan the whole free list and find the free block that
  is closest to matching the needs of this allocation.  This may
  sound good, but really it can create a lot of tiny, unusable
  fragments.  It is also slow because you have to scan the whole
  free list on each allocation.

- First fit: scan the list for the first hole that is large enough.

- Next fit: like first fit, but start where you left off.

               *********************

Knuth says that if you’re running out of storage, you will run
out no matter what scheme is picked.

TANGENT ON KNUTH:
Knuth is a faculty member at Stanford.  He started a series of
books in the late ‘60s called the Art of Computer Programming.
The first three volumes were on Fundamental Algorithms, Semi-
numerical Algorithms, and Sorting and Searching and were all
published by the mid-70s.  The fourth volume on Combinatorial
Algorithms was not released until recently because new progress
was being made faster than Knuth could read and write about it.

               *********************

- Bit map: If storage comes in fixed-size chunks, you can use
  a bit map to keep track of which chunks are free or in use.
  One bit for each chunk: 0 for in-use, 1 for free.

- Pools: In real life, programs often ask for blocks in a few
  different, specific sizes.  You can keep a separate allocation
  pool for blocks of each of the sizes used by your programs.
  This results in fast allocation without fragmentation.

               *********************

How do we know when dynamically allocated memory can be freed?
You need to know of all the pointers to a piece of memory to
tell if it is no longer in use.  If there any “dangling pointers”
pointing to a piece of memory, it should not be freed.  On the
other hand, you don’t want to forget to free a piece of memory
or else you will have a memory leak.

               *********************

REFERENCE COUNTS:
One way to tell if a chunk of memory can be freed is to keep
a count of the number of pointers that point to it.  When this
count goes to zero, free that chunk.  This count must be managed
carefully by the system.  If you change a pointer by using the
“=” operator in C, this counter won’t automatically change.
Pointer changes need to be a system call in order for this scheme
to work.  Therefore, this scheme is not very common in practice.
Another problem, is that if A points to B, and B points to A,
but nothing else points to either of them, they ought to be freed
but their circular pointer situation prevents their counters
from decrementing to zero, so they won’t be freed.

               *********************

GARBAGE COLLECTION:
When the system needs more memory, it follows all the pointers
it can find and keeps only those chunks it can access.  Everything
else is freed.  This makes life easy for the application programmer
because he/she doesn’t have to worry about explicitly freeing
blocks.  How does it work?

- Go through all global and local variables looking for pointers.
  Mark each object pointed to, and recursively mark all objects
  they point to.

- Next, go through all memory, and free those objects that are not
  marked.

You need to be able to find all the objects and be able to tell
what is and is not a pointer for this to work.  Also, all objects
must have a spare bit for marking.

Garbage collection is bad for real time systems, because it can
cause computation to pause for an extended period of time.
It often takes 20% or more of all CPU time in systems that use it.

               *********************

BUDDY SYSTEM:
This is another allocation scheme.  All block sizes are a power
of two.  Large blocks are split into halves to form “buddy blocks.”
When a block is freed, if its buddy is also free, they are rejoined
into a single, larger free block.  Finding a block’s buddy is only
a matter of flipping the appropriate bit in the block’s address.


               *********************


SHARING MAIN MEMORY: Segmentation and Paging
============================================

1. World’s Simplest Memory System ...

- Uniprogramming with a single segment per process
- Process is always loaded at address 0, and allocated up
- The OS is held in the highest memory

Advantages:
- Low overhead
- Simple to implement
- No need to do relocation, because processes are always loaded
  at address 0

Disadvantages:
- No protection.  Any process can gain complete control of
  the system by overwriting OS memory.
- In order to implement multiprogramming, the entire process
  must be switched out of memory.  This would create huge overhead
  and idle time.
- Multiprogramming wouldn’t even be that advantageous because you
  can’t overlap CPU and I/O time because only one process can be
  in memory at a time.

Diagram:   _________
          |   OS    |
          |---------|
          |         |
          |         |
          |---------| 
          | Process |
           ---------

               *********************

2. Another Memory System

- Same as above, except processes can be loaded anywhere in memory
  by having their addresses resolved by a linker/loader.
- The program can’t be moved once loaded, though.  This is because
  the linker/loader hard-codes the addresses into the instructions.
  Also, dynamically allocated data may be in arbitrary places.

               *********************

3. Simple Multiprogramming
(with static software relocation, no protection,
one segment per process)

- The OS can be stored in highest or lowest regions of memory
- When a process is initially loaded, it can be put in any (unused)
  area of memory, because a linker/loader is used
- Can have several programs in memory at once

Advantages:
- Allows for multiprogramming while keeping more than one process
  in memory at once.  Thus, no swapping is required.
- Makes better use of memory, because typically more than one process
  would be able to fit there, so you might as well attempt to put
  more than one process there.
- Results in higher CPU use because one process can use the CPU while
  another is waiting for an I/O device (like the disk).

Disadvantages:
- No protection.  Jobs can interfere with one another or with the OS.
- Potential for external fragmentation between processes.
- Limited to the size of physical memory (ie: we have not introduced
  virtual memory yet)
- Each program is tied to the physical location in memory at which
  it was loaded.  If a process is swapped to disk, for example, it
  must return to the exact same address when it is swapped back in.
  This is because all the addresses were hard-coded by the loader.

               *********************

THE LECTURE IS OVER!  BYE!