CS 164 Lecture notes for 14 February 2005 by Keith Rarick, based on notes posted by Alan Smith Topic: Linkers and Loaders In more detail: Operation of a Linker: - Collect all the pieces of a program, - This may involve finding some segments from the file system and libraries. If a program calls a function that is not defined in the current file, it must be defined in some other file. - Assign each segment a final location. Build the segment table. This is just a list of segments and their addresses. The final locations are not know until all code is brought together. - Resolve (i.e. fix) all of the addresses that can be fixed. Result is a new object file. All addresses may be resolved, or there may be a new object file with some addresses still unresolved. - This is done by taking the symbol table for each segment and assigning to each symbol its new address. - If the linker has been given an absolute address for loading, then absolute address is calculated. - If linker is to produce new relocatable object module, then absolute address relative to zero is calculated, and new symbol table is output. - Relocation table is scanned, and for every entry in the code, value is replaced with new absolute value. The relocation table is a list of all places in the code where an address exists, when that address depends on the ultimate location of the code itself. Thus, if the code is moved to a different base location, these addresses need to be changed accordingly. - If the linker has been given an absolute address for loading, then absolute address is calculated. - If linker is to produce new relocatable object module, then absolute address relative to zero is calculated, and new relocation table is output. Topic: Dynamic Storage Allocation - Dynamic Storage Allocation was covered in previous courses. - Why isn't static allocation sufficient for everything? Unpredictability: can't predict ahead of time how much memory, or in what form, will be needed: - Recursive procedures. Even regular procedures are hard to predict (data dependencies). A recursive procedure may call itself an arbitrary number of times. This number cannot, in principle, be determined statically. (This is equivalent to the halting problem.) Non-recursive procedures may call other non-recursive procedures in different ways depending on input the program gets at runtime (aka data dependencies). Here it is possible to calculate an upper bound on memory required, but it is tricky and possibly wasteful: - Complex data structures, e.g. linker symbol table. If all storage must be reserved in advance (statically), then it will be used inefficiently (enough will be reserved to handle the worst possible case). - For example, OS doesn't know how many jobs there will be or which programs will be run. - Need dynamic memory allocation both for main memory and for file space on disk. The same reasoning applies to both cases. - Two basic operations in dynamic storage management: - Allocate - Free These are pretty self-explanatory. - Dynamic allocation can be handled in one of two general ways: - Stack allocation (hierarchical): restricted, but simple and efficient. - "Heap" allocation: more general, but less efficient, more difficult to implement. (I.e. uses free storage area.) - Stack organization: Memory allocation and freeing are partially predictable (as usual, we do better when we can predict the future). Allocation is hierarchical: memory is freed in opposite order from allocation. If alloc(A) then alloc(B) then alloc(C), then it must be free(C) then free(B) then free(A). Aka a LIFO structure. - Example: procedure call. X calls Y calls Y again. Space for local variables and return addresses is allocated on a stack. For the primary stack used by the program to organize procedure calls, all objects are allocated contiguously in memory. When we need to allocate a new object, we always know where the free space is -- right after the last allocated object. (See below.) - Stacks are also useful for lots of other things: tree traversal, expression evaluation, top-down recursive descent parsers, etc. As is well-known from 61B. A stack-based organization keeps all the free space together in one place. Practically speaking, then, the only bookkeeping we need to allocate memory is a pointer to the free space. Every thing on one side of this pointer is allocated and everything on the other is free. To allocate more memory the program just modifies this pointer. To free an object, the program must remember the size of the last allocated object (that hasn't already been freed). Then it can simply re-adjust the pointer. - Heap organization: allocation and release are unpredictable. Heaps are used for arbitrary list structures, complex data organizations. Example: payroll system. Don't know when employees will join and leave the company, must be able to keep track of all them using the least possible amount of storage. - Memory consists of allocated areas and free areas (or holes). Inevitably end up with lots of holes. Goal: reuse the space in holes to keep the number of holes small, their size large. Why do we end up with holes? Example: we allocate ten 100-byte arrays, then free the fifth and seventh. Now we have holes. If we only ever freed things from the end, we could make do with a stack. We want to keep the number of holes small to make it faster to find free space when we need it. Every time we want to allocate memory, we have to search through a list of free areas. If the list is smaller, then the search will be faster. We want to keep the size of holes large to make it more likely that any particular hole will be sufficient to serve an allocation request. - Fragmentation: inefficient use of memory due to holes that are too small to be useful. In stack allocation, all the holes are together in one big chunk. Thus, stack allocation is the ideal limit case for this aspect of memory allocation. - Internal - space is wasted within blocks You have asked for (and only intend to use) n bytes, but the system has reserved m bytes where m > n. This can happen if the system is only able to allocate memory in multiples of a fixed number, k, and n is not a multiple of k. - External - space wasted between blocks. This happens when unallocated regions are too small. E.g. the program wants 8KB, but there are lots of 1KB holes. - Typically, heap allocation schemes use a free list to keep track of the storage that is not in use. Algorithms differ in how they manage the free list. - Best fit: keep linked list of free blocks, search the whole list on each allocation, choose block that comes closest to matching the needs of the allocation, save the excess for later. During release operations, merge adjacent free blocks. - First fit: just scan list for the first hole that is large enough. Free excess. Also merge on releases. Most first fit implementations are rotating first fit (aka Next fit). - Next Fit: like first fit, but start where you left off. - Best fit is not necessarily better than first fit. Suppose memory contains 2 free blocks of size 20 and 15. - Suppose allocation ops are 10 then 20: which approach wins? In first fit, we allocate 10 from the first block, leaving two free blocks of size 10 and 15. Then the second allocation fails. In best fit, we allocate 10 from the second (smaller) block, leaving two free blocks of size 20 and 5. The second allocation happens from the first block. Best fit wins. - Suppose ops are 8, 12, then 12: which one wins? In first fit, we allocate 8 from the first block, leaving sizes 12 and 15. Then allocate 12 from the first block, leaving only one block of size 15. Finally we can allocate 12 from the remaining bloc. In best fit, we allocate 8 from the second block, leaving sizes 20 and 7. Then we allocate 12 from the first block, leaving sizes 8 and 7. The third allocation fails. First fit wins. - First fit tends to leave ``average'' size holes, while best fit tends to leave some very large ones, some very small ones. The very small ones can't be used very easily. - Knuth claims that if storage is close to running out, it will run out regardless of which scheme is used, so pick easiest or most efficient scheme (first fit). The industry agrees. Most actual OSes do not use best fit. - Bit Map: used for allocation of storage that comes in fixed-size chunks (e.g. disk blocks, or 32-byte chunks). Keep a large array of bits, one for each chunk. If bit is 0 it means chunk is in use, if bit is 1 it means chunk is free. Will be discussed more when talking about file systems. - Pools: keep a separate allocation pool for each popular size. Allocation is fast, no fragmentation. This only works if there are a small number of popular sizes. - Reclamation Methods: how do we know when dynamically-allocated memory can be freed? - It's easy when a chunk is only used in one place. In that case, you allocate in the spot it is used, and deallocate it right away when that place is done. - Reclamation is hard when information is shared: it can't be recycled until all of the sharers are finished. Sharing is indicated by the presence of pointers to the data. Without a pointer, can't access (can't find it). In other words, this is a way you can decide if a piece of data is still "in use". It is in use if anything still has a pointer to it. If nothing points to the data, it is inaccessible and hence no longer in use. - Two problems in reclamation: - Dangling pointers: better not recycle storage while it's still being used. If a piece of memory is freed while a pointer still exists, anything could happen if code later tries to use that pointer. The program could crash, give incorrect results, or even continue to work for some time if the memory in that area has not been cleared or reused. - Core leaks: Better not ``lose'' storage by forgetting to free it even when it can't ever be used again. For long-running programs this can cause the process to run out of memory. In effect, the amount of total memory decreases every time an object becomes inaccessible but unfreed. This is because that memory region is neither available nor in use. - Reference Counts: keep track of the number of outstand- ing pointers to each chunk of memory. When this goes to zero, free the memory. Example: Smalltalk, file descriptors in Unix. Works fine for hierarchical struc- tures. The reference counts must be managed carefully (by the system) so no mistakes are made in incrementing and decrementing them. - What happens when there are circular structures? This is the fatal problem with reference counting. It won't free a circular structure. If initially A -> B, B -> C and C -> B, B's reference count is 2 and C's is 1. Then if A's pointer is cleared, both B and C are unreachable but their reference counts are both 1. Their counts will never reach zero and the won't be freed. - Garbage Collection: storage isn't freed explicitly (using free operation), but rather implicitly: just delete pointers. When the system needs storage, it searches through all of the pointers (must be able to find them all!) and collects things that aren't used. (Marking Algorithms) If structures are circular then this is the only way to reclaim space. Makes life easier on the application programmer, but garbage collectors are incredibly difficult to program and debug, especially if compaction is also done. Examples: Lisp, capability systems. Many consider reference counting to be a form of garbage collection. What is called "garbage collection" here is mark-and-sweep garbage collection. - How does garbage collection work? - Must be able to find all objects. - Must be able to find all pointers to objects. - Pass 1: mark. Go through all statically-allocated and procedure-local variables, looking for pointers. Mark each object pointed to, and recursively mark all objects it points to. The compiler has to cooperate by saving information about where the pointers are within structures. - Pass 2: sweep. Go through all objects, free up those that aren't marked. - Garbage collection is often expensive: 20% or more of all CPU time in systems that use it. - Buddy System Topic: Sharing Main Memory -- Segmentation and Paging - How do we allocate memory to processes? - 1. Simple uniprogramming with a single segment per process: - One program in memory at a time. (Can actually multiprogram by swapping programs.) - Highest memory holds OS. The OS lives in the region of memory with the greatest addresses. - Process is allocated memory starting at 0 (or J), up to (or from) the OS area. J is the beginnig of the OS region. - Process always loaded at 0. - Examples: early batch monitors where only one job ran at a time and all it could do was wreck the OS, which would be rebooted by an operator. Many of today's personal computers also operate in a similar fashion. This remark is slightly out-of-date. Most PCs today have more modern virtual memory systems. The remark refers to DOS systems. - Advantages - Low overhead - Simple - No need to do relocation. Always loaded at zero. - Disadvantages - No protection - process can overwrite OS - which means it can get complete control of system - Multiprogramming requires swapping entire process in and out - Overhead for swapping - Idle time while swapping - Process limited to size of memory - CTSS ("compatible" time sharing system), and how system swapped users completely. - No good way share - only one process at a time (can't even overlap CPU and I/O, since only one process in memory.) - 2. Relocation - load program anywhere in memory. - Idea is to use loader or linker to load program at an ar- bitrary memory address. - Note that program can't be moved (relocated) once loaded. (WHY??) Two reasons. First, the addresses embedded within the program have already been rewritten to their final values and the relocation table has been thrown away. So there is no way to find out what needs to be updated. Also, a process that is executing in that code would not function properly if the code were suddenly moved. - This scheme (#2) is essentially the same as #1, but the ability to load at any address will be used in #3 below. - 3. Simple multiprogramming with static software relocation, no protection, one segment per process: - Highest or lowest memory holds OS. - Processes allocated memory starting at 0 (or N), up to the OS area. - When a process is initially loaded, link it so that it can run in its allocated memory area - Can have several programs in memory at once, each loaded at a different (non overlapping) address. - Advantages: - Allows multiprogramming without swapping processes in and out. - Makes better use of memory - Higher CPU use due to more efficient multiprogram- ming. - Disadvantages - No protection - jobs can read or write others. - External fragmentation - Overhead for variable size memory allocation. - Still limited to size of physical memory. - Hard to increase amount of memory allocation. - Programs are staticly loaded - are tied to fixed lo- cations in memory. Can't be moved or expanded. If swapped out, must be swapped to same location.