Lecture 2/12/07 The lecture includes the following topics: I. Monitor examples II. Introduction to Storage Allocation; Linkers III.Dynamic Storage Allocation IV. Sharing Main Memory -- Segmentation and Paging Topic: Monitor Examples ======================== Monitors were discussed last week and this week we are going to look at a couple of more examples. Wait and signal are used in process synchronization. When you enter a monitor procedure you either implicitly or explicitly get a lock. So monitors are higher level synchronization primitive than semaphores. They are simple and easy to use. Monitors combine the data and the methods to operate on the data. We have a producer and consumer problem from the Hoare paper. This is the bounded buffer problem. We have two conditions: non-empty and non-full. The append procedure –if you enter append when the count equals the size of the buffer then the buffer is full and we have to wait until the buffer is not full. Assume either the buffer is not full so you set your pointer to the last element of the buffer and signal the procedure non-empty, so anyone waiting for the buffer can use it. Signal lets anyone know that he is done appending. Remove procedure is like append but in reverse. You can only remove if there is an item to be removed from the buffer. So if the counter is 0 then the buffer is empty and you have to wait. You wait on the condition non-empty. Then you pull an item from the buffer, you decrement the count and you signal to any thread that the buffer is no longer full. And you initialize the counter to 0. In this example we assume that locks are implicitly acquired when you entered the procedure either append or remove and locks are implicitly removed when you leave the procedure. Let’s look at a slightly more complicated example - the disk head scheduler from the Hoare paper. We have 2 methods: request and release. Request is called before issuing a command to move the disk head to the destination cylinder. We have the head position which is the position of the head. We have variable sweep which tells you the direction of the head: up or down. We want to schedule the disk arm so that it moves minimum distances. If you just use first come first serve than your arm will be sweeping all over the disk. What we want is to move minimal distance. But if you only service the requests that are closest to where you currently are then this introduces the problem if starvation because the cylinders at the beginning and the end of the disk will never get serviced. This uses the elevator algorithm. (Look at code in the paper). We keep track of head position: up or down and we keep track of whether the disk head is currently busy. We have 2 conditions upcondition a! nd downcondition. We wait on the upcondition if we were waiting to be woken up on the next upsweep of the arm. We are waiting on the downsweep condition if the arm is currently moving down and our position is down with respect to the arm. Procedure request- the parameter you pass is the cylinder you want to be on. If you call request and the disk is currently busy you have to wait. Cylinder numbers 1. ------------------- 2. ------------------- 3. ------------------- 4. ------------------- 5. ------------------- 6. ------------------- 7. ------------------- You have request 4 and head position is currently 2 and it is sweeping down and it is busy then if busy we have to do some checking. If the hed position is less tha the destionation we want. so in this case it is and we wait for the upsweep. If the head position is equal to the position where we want to read we wait for the upstream. This pairs it with the request position. When we release we set busy to false since we are done reading. Topic: Introduction to Storage Allocation; Linkers ================================================== Object code in system such as Unix divided into three parts which are Code ("text" in Unix terminology), Data and Stack. We distinguish between different segments of a program because we will sometimes want to have two or more processes running at the same time run the same code- i.e.share the code, which is possible since code isn't modified. Data and stack segments must apply to each process, since those are modified. Code is generated by the compiler from source code. This code contains addresses, some of which have values not known at compilation or assembly time. Address of base of text segment not known at compile tim. Address of base of code not known at assembly time. Addresses are not known because we want the ability to combine code that was compiled at different times. If we compile everything at the same time, we don't have a problem. The compiler generates one object file for each source code file containing information for that file. Information is incomplete, since each source file generally references some things defined in other source files. Compiler provides the symbol table and relocation table. The linker or Linkage-Editor combines all of the object files for one or more programs into a single object file, which is complete and self-sufficient. (linker can go from many object files to just one.) The loader takes a single object file and adjusts all addresses to reflect correct load address for object file. Linker often includes function of loader. The Operating system places object files into memory,allows several different processes to share memory at once, provides facilities for processes to get more memory after they've started running. The run-time library works together with OS to provide dynamic allocation routines. There are absolute addresses within a code segment.The values of these addresses need to be determined.When code is compiled or assembled, these addresses are usually set as if segment is loaded at zero. There are relative addresses (e.g. JMP * 28) - no problem. There are external addresses - e.g. CALL SUBPROG. You don't resolve these until you have your compilation. There are addresses of data in the data segment. When Compiling we create 1. Segment Table - for each segment, we need the segment name, segment size, and the base address at which it was assumed to be loaded. 2. Symbol table - contains global definitions - has table of labels that are needed in other segments. Usually, these have been declared in some way; internal labels are not externally visible. (Inter- nal labels were known by the compiler anyway.) 3. Relocation Table - table of addresses within this segment that need to be fixed, i.e. relocated. Con- tains internals - references to locations within this segment, and external references - references that are believed to be external. The compiler provides these tables along with the object code for each segment. Effectively, there are 3 steps in a Linker/Loader: 1.Determine location of each segment 2.Calculate values of symbols and update symbol table 3.Scan relocation table and relocate addresses. Operation of a Linker in more detail 1.Collect all the pieces of a program -this may involve finding some segments from the file system and libraries. 2. Assign each segment a final location. Build the segment table. 3. Resolve (i.e. fix) all of the addresses that can be fixed. Result is a new object file. All addresses may be resolved, or there may be a new object file with some addresses still unresolved - this is done by: a) taking the symbol table for each segment and assigning to each symbol its new address. b) Relocation table is scanned, and for every entry in the code, value is replaced with new absolute value. Topic: Dynamic Storage Allocation ================================= Static allocation is not sufficient because: Unpredictability: can't predict ahead of time how much memory, or in what form, will be needed. Examples: a) Recursive procedures. Even regular procedures are hard to predict (data dependencies). b)Complex data structures, e.g. linker symbol table. If all storage must be reserved in advance (stati- cally), then it will be used inefficiently (enough will be reserved to handle the worst possible case). There are two basic operations in dynamic storage management- Allocate and Free. Dynamic allocation can be handled in one of two general ways: 1.Stack allocation (hierarchical): restricted, but simple and efficient. 2."Heap" allocation: more general, but less effi- cient, more difficult to implement. (I.e. uses free storage area.) Stack organization: Memory allocation and freeing are partially predictable (as usual, we do better when we can predict the future). Allocation is hierarchical: memory is freed in opposite order from allocation. Heap organization: allocation and release are unpredictable. Heaps are used for arbitrary list struc- tures, complex data organizations. Example: payroll system. Don't know when employees will join and leave the company, must be able to keep track of all them us- ing the least possible amount of storage. Memory consists of allocated areas and free areas (or holes). Inevitably end up with lots of holes. Goal: reuse the space in holes to keep the number of holes small, their size large. Fragmentation: inefficient use of memory due to holes that are too small to be useful. In stack al- location, all the holes are together in one big chunk. Fragmentation can be: Internal - space is wasted within blocks External - space wasted between blocks. Typically, heap allocation schemes use a free list to keep track of the storage that is not in use. Algorithms differ in how they manage the free list. Best fit: keep linked list of free blocks, search the whole list on each allocation, choose block that comes closest to matching the needs of the alloca- tion, save the excess for later. During release operations, merge adjacent free blocks. First fit: just scan list for the first hole that is large enough. Free excess. Also merge on releases. Most first fit implementations are rotat- ing first fit. (Next fit.) Next Fit: like first fit, but start where you left off. Best fit is not necessarily better than first fit. Suppose memory contains 2 free blocks of size 20 and 15. Suppose allocation ops are 10 then 20: which approach wins? - Best Fit, first allocates 10->15 and then 20->20. First Fit could have allocated 10->20 and then have no place to put 20 Suppose ops are 8, 12, then 12: which one wins?-First Fit First fit tends to leave ``average'' size holes, while best fit tends to leave some very large ones, some very small ones. The very small ones can't be used very easily. Bit Map: used for allocation of storage that comes in fixed-size chunks (e.g. disk blocks, or 32-byte chunks). Keep a large array of bits, one for each chunk. If bit is 0 it means chunk is in use, if bit is 1 it means chunk is free. Will be discussed more when talking about file systems. Pools: keep a separate allocation pool for each popular size. Allocation is fast, no fragmentation. Reclamation Methods: how do we know when dynamically- allocated memory can be freed? It's easy when a chunk is only used in one place. Reclamation is hard when information is shared: it can't be recycled until all of the sharers are fin- ished. Sharing is indicated by the presence of pointers to the data. Without a pointer, can't ac- cess (can't find it). Two problems in reclamation: 1. Dangling pointers: better not recycle storage while it's still being used. 2. Core leaks: Better not ``lose'' storage by forget- ting to free it even when it can't ever be used again. Reference Counts: keep track of the number of outstand- ing pointers to each chunk of memory. When this goes to zero, free the memory. Example: Smalltalk, file descriptors in Unix. Works fine for hierarchical struc- tures. The reference counts must be managed carefully (by the system) so no mistakes are made in incrementing and decrementing them. What happens when there are circular structures? - - | |->| | | |<-| | - - The memory can never be freed since there is alwys reference to it. Garbage Collection: storage isn't freed explicitly (us- ing free operation), but rather implicitly: just delete pointers. When the system needs storage, it searches through all of the pointers (must be able to find them all!) and collects things that aren't used. (Marking Algorithms) If structures are circular then this is the only way to reclaim space. Makes life easier on the ap- plication programmer, but garbage collectors are incred- ibly difficult to program and debug, especially if com- paction is also done. Examples: Lisp, capability sys- tems. How does garbage collection work? - Storeage is released implisitly Must be able to find all objects. Must be able to find all pointers to objects. Mark and sweep Pass 1: mark. Go through all statically-allocated and procedure-local variables, looking for pointers. Mark each object pointed to, and recursively mark all objects it points to. The compiler has to cooperate by saving information about where the pointers are within structures. Pass 2: sweep. Go through all objects, free up those that aren't marked. Garbage collection is often expensive: 20% or more of all CPU time in systems that use it. Buddy System Memory is divided in size chucks which are powers of 2. If a small is needed memory is divided until we get a power of 2 chunk which can hold the data. Once memory is freed this chunk is merged back with its buddy chunk since every splitting of memory will cause for a chunk to have a buddy. __________________ | | | | | ------------------ ^ ^ | | buddies Topic: Sharing Main Memory -- Segmentation and Paging ====================================================== How do we allocate memory to processes? 1. Simple uniprogramming with a single segment per process: One program in memory at a time. (Can actually multipro- gram by swapping programs.) Highest memory holds OS. Process is allocated memory starting at 0 (or J), up to (or from) the OS area. Process always loaded at 0. Advantages 1. Low overhead 2.Simple 3.No need to do relocation. Always loaded at zero. Disadvantages 1.No protection - process can overwrite OS, which means it can get complete control of system 2.Multiprogramming requires swapping entire process in and out a) Overhead for swapping b) Idle time while swapping 3.Process limited to size of memory 4.No good way share - only one process at a time (can't even overlap CPU and I/O, since only one process in memory.) 2. Relocation - load program anywhere in memory. Idea is to use loader or linker to load program at an ar- bitrary memory address. Note that program can't be moved (relocated) once loaded. This scheme (#2) is essentially the same as #1, but the ability to load at any address will be used in #3 below. 3. Simple multiprogramming with static software relocation, no protection, one segment per process: Highest or lowest memory holds OS. Processes allocated memory starting at 0 (or N), up to the OS area. When a process is initially loaded, link it so that it can run in its allocated memory area Can have several programs in memory at once, each loaded at a different (non overlapping) address. Advantages: 1. Allows multiprogramming without swapping processes in and out. 2.Makes better use of memory 3.Higher CPU use due to more efficient multiprogram- ming. Disadvantages No protection - jobs can read or write others. External fragmentation Overhead for variable size memory allocation. Still limited to size of physical memory. Hard to increase amount of memory allocation. Programs are staticly loaded - are tied to fixed lo- cations in memory. Can't be moved or expanded. If swapped out, must be swapped to same location. 4. Dynamic memory relocation: instead of changing the ad- dresses of a program before it's loaded, change the address dynamically during every reference. Figure of a processor and a memory box, with a memory re- location box in between. --- --- --- |CPU|->|MMU|->|MEM| --- --- --- There are many types of relocation - to be discussed. Under dynamic relocation, each program-generated address (called a logical or virtual address) is translated in hardware to a physical, or real address. This happens as part of each memory reference. Virtual (logical) Address is what the program generates. Virtual address space is set of (legal) virtual addresses the program can generate. Physical (real) addresses - set of addresses in phy- sical memory.Physical address space of program - set of physical addresses it can get to. Physical address space of machine - set of addresses in physical memory. Dynamic relocation leads to two views of memory, called address spaces. We have the virtual address space and the real address space. Each process has its own virtual address space. With static relocation we force the views to coincide. In some systems, there are several levels of mapping. Several types of dynamic relocation: Base & bounds relocation: Two hardware registers: base register for process, bounds register that indicates the last valid address the process may generate. Real address = base virtual address IF virtual_address < bounds, and VA>= 0 In parallel, the real address is generated by adding it to the base register. On each memory reference, the virtual address is compared Advantages: Each process appears to have a completely private memory of size equal to the bounds register plus 1. Processes are protected from each other. No address relocation is necessary when a process is loaded. Task switching is very cheap, when done between processes in memory- just reload processor registers. Higher overhead to load process from disk. Compaction is possible. Disadvantages: Still limited to size of main memory. External fragmentation (between processes) Overhead for allocating variable size spaces in memory. Sharing difficult - only possible if bases & bounds overlap. Only one "segment" - i.e. one region of memory. New, special hardware needed for relocation. Time to do relocation (it isn't free). OS must be able to change value of relocation registers OS loads new process and sets base and bounds regis- ters. OS schedules process, and sets base and bounds regis- ter. When tasks are switched, must be able to swap base, bounds and PC registers simultaneously. These imply that OS must run with base and bounds re- location turned off - otherwise, would affect itself when running. (Or would need its own set of base and bounds registers.) Use of base and bounds controlled by status bit, usu- ally in PSW or SSW, or similar control register. Users must not be able to change values of base and bounds registers Otherwise, no protection between users. Can trash others or OS. Problem: how does OS regain control once it has given it up? OS is entered on trap (including SVC) or interrupt. When OS is entered, use of base and bounds must be disabled. (I.e. bit in PSW is reset.) Typically, trap handler loads new control register values. Base & bounds is cheap -- only 2 registers -- and fast -- the add and compare can be done in parallel. Can consider three types of systems using base and bounds re- gisters: 1. Uniprogramming - single user region. Bring a user in, and run him. 2. Multiprogramming with Fixed Partitions - (OS/MFT) - par- tition memory into fixed regions (may be different sizes). User goes into region of given size. 3.Multiprogramming with Variable Partitions (OS/MVT) - par- titions are dynamically variable. Note that we can do any of the three above schemes without base and bounds registers - just load programs into region at appropriate base address. Task Switching We can now switch between processes very cheaply - don't have to reload memory, just change contents of process control block (which now has values of base and bounds registers). We can also run processes which are not in memory Find empty area of memory in which to place process Remove one or more processes from memory, if necessary, in order to find space. (I.e. copy the removed processes to space on disk.) Copy new process (from disk) into memory. If only one process fits in memory, have to wait for swap to take place. If several processes fit in memory, can swap one while executing the other. 5. Multiple segments - Segmentation. Divide virtual address space into several "segments". This is not the same as the "segments" of linkers and loaders. Use a separate base and bound for each segment, and also add protection bits (read, write, execute), and valid bit. (Also will want dirty bit.) Each address is now Each memory reference indicates a segment and offset in one or more of three ways: Top bits of address select segment, low bits the offset. This is the most common, and the best. Or, segment is selected implicitly by the instruction (e.g. code vs. data, stack vs. data, which base re- gister is used, or 8086 prefixes). Instruction specifies directly or indirectly a base register for the segment. Subprograms (procedures, functions, etc.) can be separate segments. Segments typically are associated with logical partitions of your process address space - e.g. code, data stack. Or, each module or procedure can be a separate segment. Need either segment table or segment registers to hold the base and bounds for each segment. picture of segment table, with segment table en- tries. ------------------------ | base | bound | flags | ------------------------ 0| | | | ------------------------ 1| | | | ------------------------ Memory mapping procedure consists of table lookup add compare. Address translation for segmentation Have segment table - maps segment number to [Segment base address, segment length (limit), protection bits, valid bit, reference bit, dirty bit] This info is in Segment Table Entry (STE) Diagram of segment table. Segment descriptor Need some hardware to automatically map virtual (segment number, word number) to real address. Real address = segment_table(segment #) word number. Invalid if word_number > limit. (Note that we do test without adding bound to both) Also valid bit must be on, and permission bits must permit access. Need more hardware to make it go fast (discuss later) Have Segment Table Base Register (STBR) point to base of segment table (for hardware to use) Alternate approach - if there are a small number of segments, can have segment registers - one register per segment. Can also multiplex a small number of segment registers among a large number of segments (as with X86 architecture) Advantages Each process has own virtual address space Protection between address spaces Separate protection between segments (R/W/E) Virtual space can be larger than physical memory Unused segments don't need to be loaded Can load segments as needed. Attempt to reference missing segment called segment fault. Can share one or more segments sharing is tricky - we'll talk about this later Segments can be placed anywhere in memory that they fit. Memory compaction easy. Segment sizes can be changed independently. Disadvantages Each segment must be allocated contiguously. Segment size < memory size External fragmentation Overhead of allocating memory Need hardware for address translation Overhead (time/hardware) of doing address translation More complicated. Space for segment table. Note that segment tables are usually 1-1 with processes. A segment table defines a process's address space. Protection is a problem Have same problem as before - now we have to allocate shared virtual instead of shared physical memory. When we switch processes, we reload the STBR (segment table base register), which changes address space.