Topic: Semaphore Implementation + Unfortunately, no existing hardware implements P&V directly. P&V are complex and implicitly require scheduling. This is + (a) Too complicated to put in hardware. + (b) Makes it too inflexibile - don't want scheduling in hardware. + (c) Too hard to make atomic - would be dozens or hundreds of instructions. + (d) Too long - too long to disable interrupts, etc. + Need a simple way of doing mutual exclusion in order to implement P's and V's. We could use atomic reads and writes, as in ``too much milk'' problem, but these are very clumsy. Still okay, and can be used in the absence of anything better - note that the too much milk solu- tion does produce a critical section, which is what we need. But so far, only works for 2 processes - what about N? + Uniprocessor solution: disable interrupts. + Remember that the only way the dispatcher re- gains control is through interrupts or through - .1 - explicit process request. + Note that disabling interrupts is a supervisor call. + Can we really disable ALL interrupts? There are usually some that can't be disabled. + User can't disable interrupts - must ask system to do it. Makes every P & V a system call. + Which process should we remove from the queue and wake up? That is determined by the scheduling algorithm. + What do we do in a multiprocessor to implement P's and V's? Can't just turn off interrupts to get low-level mutual exclusion. + Turn off all other processors? + Use atomic read and write, as in ``too much milk''? + The atomic read, or atomic write solution is too complicated. + Most machines provide some sort of atomic read-modify- write instruction. Read existing value, store back in one atomic operation. + We will use this atomic operation to create a busy- waiting implementation of P and V. + E.g. Atomic increment value in memory, and then load, and decrement value in memory. + Operations are - .2 - + {increment to value in memory, load incre- mented value}, + {decrement value in memory}. + For busy waiting, can do the following: + Init: A=0 + loop: {increment A in memory, load A}, + if A~=1, then {decrement A in memory}, go to loop + critical section here + {decrement memory location A} + Note that this solution is susceptible to inde- finite postponement - i.e. with N processes (N large). In simple case, N=3, value oscillates between 1, 2 and 3. If one leaves, can oscil- late between 0, 1 and 2. With "after you" al- teration, can oscillate between 1 and 2, and never terminate. + E.g. Swap + Operation is: swap(local(i),lock) - interchanges values of two variables. Lock is locked if it is "true". + local(i) is local variable for process i; lock is shared among all processes. + Busy waiting loop: + init: lock=false + local(i)=true - .3 - + repeat swap(local(i),lock) until local(i)==false; + Critical section here + lock=false; + E.g. Test and set (IBM solution). Set value to true, but return OLD value. Use ordinary write to set back to false. Lock is locked if it is "true". + i.e. Tset(local(i), lock): {local(i)=lock; lock=true.} + (for process i) + Busy waiting loop: + Init lock=false + repeat(Tset(local(i),lock) until local(i)==false; + Critical section + lock=false. + Read-modify-writes may be implemented directly in memory hardware (e.g. IBM S/360), or in the proces- sor by refusing to release the memory bus (PDP-11). + Has to be done in some central place, so it's not possible for somebody to sneak around the back. + Using test and set for mutual exclusion: + It's like a binary semaphore in reverse, except that it doesn't include waiting. 1 means someone else is - .4 - already using it, 0 means it's OK to proceed. + Definition of test and set prevents two processes from getting a 0->1 transition simultaneously. + Test and set is tricky to use, since you can't get at it from HLLs. Typically, use a routine written in assem- bler. + Using test and set to implement semaphores: For each semaphore, keep a test-and-set integer in addition to the semaphore integer and the queue of waiting processes. (This is multiprocessor system.) P(S) Disable interrupts must disable interrupts for this processor. Still want uninterrupted se- quence of operations on this processor. - see below local(i):=T Repeat (Tset(local(i),S.Lock)) until local(i)==false must spin on lock, since we are waiting for other processor. Alternate attempt below. If S>0, then {S:=S-1, S.Lock=false; enable interrupts, return} - .5 - Add process to S.Q S.Lock:=false Enable interrupts call dispatcher V(S) Disable interrupts local(i):=T Repeat (Tset(local(i),S.Lock) until local(i)==false if (S.Q is empty) S:=S+1 else {remove process from S.Q; wake it up} S.Lock:=false Enable interrupts + Why do we still have to disable interrupts in addition to using test and set? + Suppose that a process A is in the middle of either P(S) or V(S), and is interrupted. The dispatcher then picks process B to run. If B attempts to do either P(S) or V(S), it will find the lock set and spin, indefinitely - until process A is rescheduled and is allowed to finish and reset the lock. + Note that ``spin-lock'' for P&V is not a problem - only takes about 10 instructions - not a long wait - much less than time to do a dispatch. + Note that we can always get a critical section using the "too much milk" solutions from earlier. That critical section can be used to implement P and V also. - .6 - + Important point: implement some mechanism once, very carefully. Then always write programs that use that mechanism. Layering is very important. - .7 - Topic: Deadlock + Until now you have heard about processes. From now on you'll hear about resources, the things operated upon (i.e. used) by processes. Resources range from cpu time to disk space to channel I/O time. + Resources fall into two classes: + Preemptible: processor, I/O channel, main memory. Can take resource away, use it for something else, then give it back later. + Non-preemptible: once given, it can't be reused un- til process gives it back. Examples are file space, terminal, printer, semaphores. + This distinction is a little arbitrary, since anything is preemptible if it can be saved and restored. It generally measures the difficulty of preemption. + OS makes two related kinds of decisions about resources: + Allocation: who gets & keeps what, for non- preemptable resources. Given a set of requests for resources, which processes should be given which resources in order to make most efficient use of the resources? - .8 - + Scheduling: - for preemptable resources: how long can they keep it. When more resources are requested than can be granted immediately, in which order should they be serviced? Examples are processor scheduling (one processor, many processes), memory scheduling in virtual memory systems. + System typically has resources. + Process loop is typically: (a) request resource, (b) use resource, (c) release resource. + Deadlock: Analogy to Traffic Jam: + see figure + Deadlock example: semaphores. + Two processes: one does P(x) followed by P(y), the other does the reverse. + In this case, the resources are the semaphores X and Y. + Deadlock is one area where there is a strong theory, but it's almost completely ignored in practice. Reason: solutions are expensive and/or require predicting the future. + Deadlock: a situation where each of a set of processes is waiting for something from other processes in the set. Since all are waiting, none can provide any of the - .9 - things being waited for. + This is relatively simple minded. We could have: + (a) processes waiting for resources of the same type - e.g. there are 4 tape drives. Each process needs 3. Each holds 2, and waits for another. + (b) More than 2 processes. E.g. A has x, B has y, C has z, A needs y, B needs z, and C needs x. + (c) could be waiting for more pieces of a resource - e.g. each needs a certain amount more memory, and holds memory in the mean time. + In general, there are four conditions for deadlock: (We consider only situations with a single example of each resource) + Mutual exclusion: two or more resources cannot be shared. The requesting process has to wait. E.g. tape drives, printers. + Hold and wait - the process making a request holds the resources that it has, and waits for the new ones. This also implies multiple independent re- quests: processes don't ask for resources all at once. + No preemption. Once allocated, a resource cannot be taken away. + There is a circularity in the graph of who has what and who wants what. Let Ri be a resource. The graph - .10 - has an arrow from Ri to Rj if there is a process holding Ri and requesting Rj. (Can call this a hold and wait graph or resource request graph.) + (This condition actually implies the hold and wait condition.) + Approaches to the deadlock problem fall into two general categories: + Prevention: organize the system so that it is im- possible for deadlock ever to occur. May lead to less efficient resource utilization in order to guarantee no deadlocks. + Cure: determine when the system is deadlocked and then take drastic action. Requires termination of one or more processes in order to release their resources. Usually this isn't practical. + Deadlock prevention: must find a way to eliminate one of the four necessary conditions for deadlock: + Create enough resources so that there's always plen- ty for all + It may not be possible to make enough of some resources, e.g. printer, tape drive, etc. + Make all resources shareable. + Some resources are not shareable - e.g. printer, tape drive. + Don't permit mutual exclusion - .11 - + not feasible + Virtualize non-shared resources (e.g. printer, card reader - spooling) + Use only uniprogramming + Don't allow waiting and holding. + Process must crash if resource not immediately available. + Phone company does this - if your call doesn't go through, it is dropped. + Process must request everything at once. (re- quires sufficient knowledge - is this feasible?) + Probably will end up requesting too much, just to be safe. E.g. too much memory, or both input and output devices all at once, even though they are not used at same time. + Starvation is possible - may wait indefin- itely long until everything you need is free. + May not be possible - e.g. system has 2 tape drives. Need 2 for input and 2 for output. Must complete input and then request output drives. Can't ask for all at once. + Process must release all current resources be- fore requesting any new ones. (Some resources may not be releasable and reacquirable, without restarting the process - e.g. printer, tape drive. Sometimes, process may not be restart- - .12 - able - e.g. you have modified a database record - e.g. balance in checking account.) + Allow preemption. + Some things can be preempted - e.g. memory, CPU (registers can be saved and restored), disk. + Make ordered or hierarchical requests. E.g. ask for all R1's, then all R2's, etc. All processes must follow the same ordering scheme. In that case, for there to be an arrow from Ri to Rj, j>i; i.e. there can't be a circular wait. We can call this the resource request graph. Of course, for this you have to know in advance what is needed. + Illustration of Deadlock Occurance - figure Deadlock Avoidance + We can "avoid" deadlock, if we only make resource allocations that are "safe." + A safe state is one in which deadlock is not inevit- able. I.e. one in which there exists a sequence of tasks, such that there are sufficient resources to complete one task, and after the resources released by that process are reclaimed, the state is still safe. (I.e. there exists a sequence of task comple- tions which leads to all tasks being completed.) Any tasks left after trying this are deadlocked. - .13 - After a task completion a safe state enters another safe state. + A safe allocation is one that leads to a safe state. + A safe sequence is the sequence of task executions leading from a safe state to another safe state. A complete safe sequence leads to all tasks being com- pleted. + If the state is not safe, it is unsafe. Unsafe states lead to deadlocks. (Exception - a task may release some resources temporarily, which may enable other tasks to complete.) + Banker's Algorithm + The banker's algorithm is the algorithm used to compute a safe sequence. + It is called the banker's algorithm, because if the processes are borrowers, the resource is money, and customers come in with partial loan requests, then the algorithm can be used to compute whether the bank can make the necessary loans to allow the cus- tomers to repay. + Consider the case of construction loans. + Do example: process has max need A 90 100 B 50 110 - .14 - C 30 160 and bank has $20 available. + For every process J (j=1...n), we need to know + Max(j,k), which is the maximum number of k that will be requested by j. + Allocation(j,k), which is the number currently allocated. + Need(j,k), which is the number still needed (Need(j,k) = Max(j,k) - Allocation(j,k)). + For every resource k, we need to know Available(k), k=1...m. (available(k) is the number still avail- able) + Algorithm works as follows: + (a) Given a request, let Alloc*(j,k) and Avail*(k) be the state after the request is granted. + (b) If all processes can finish with no additional resources, then no deadlock; i.e. state safe. + (c) Find a process x, for which for all k, Need*(x,k)<=Avail*(k). If there is no such process, then deadlock; i.e. state unsafe. Otherwise, "mark" process finished. + (d) If all processes are marked finished, or can finish with no additional resources, then no deadlock; i.e. state safe. + (e) Let for all k, Avail*(k) = Avail*(k) + Alloc*(x,k). Goto (c). + Note that if there is only one resource (money), then - .15 - the banker's algorithm applies to loans. + Feasibility? + Banker's algorithm is only possible if we have knowledge of maximum resource needs in advance. + Overhead of running Banker's Algorithm if there are lots of processes and resources may be significant - not efficient. + Recovery From Deadlock + We can either prevent deadlocks, or recover from them. Given that we often do not know what resources will be needed in the future, it will be impossible to prevent them. + General idea is to periodically run a deadlock detection algorithm. It looks for circuits in the resource allo- cation graph. If it finds a circuit, must break it. + Do this periodically, + or when system seems slow, + and/or when there are a bunch of processes that seem to have waited a long time, + and/or when there are lots of processes, and also lots of CPU idle. + Two general approaches: + Kill all deadlocked processes. - .16 - + Kill one process at a time until the deadlock cycle is eliminated. (This is preferable - less damage, and less lost computation.) + Want to select minimum cost processes to kill, which depends on whether the process can be res- tarted at all, and if so, how much computation has been lost. + Rollback - causing the process to return to an earlier state. If the process goes back to the beginning (res- tart), it is a total rollback. In some cases, (espe- cially if checkpoints have been kept), may be able to do partial rollback to break a deadlock. + Checkpoint - a copy of the state of the process at some past time. + In general, prevention of deadlock is expensive and/or inefficient. Detection is also expensive and recovery is seldom possible (what if process has things in a weird state?). + IBM OS/360 solutions: + Data set names are enqueued all at once on a job basis. + Devices, volumes, and memory are allocated in order on a per-job-step basis. - .17 - + Temporary file space can cause deadlocks. (or re- quests for additional file extents). Operator in- tervenes and kills a job. + Multics Solution: in main path of process, must request resources in specific order. In secondary paths (which can be preempted), must release resources held out of order (i.e. need "wait permit"). + MTS (University of Michigan Timesharing System) - put in only mild constraints on resource allocation. Run deadlock detection algorithm periodically, and kill jobs that are found to be deadlocked. Seldom found to be the case. + Unix - synchronization: + Synchronized use of a resource done with locked flag and a wanted flag. + When a kernel process wants a resource, it checks the locked flag. If not in use, sets the locked flag, and uses resource. (Must obviously set mutex to do this atomically.) + If locked flag is set, sets wanted flag, and calls sleep(), with a wait channel associated with that resource (typically the data structure used to describe the resource.) + When done, clears locked flag, and if wanted flag is - .18 - set, calls wakeup() to awaken all of the processes that called sleep() to await access to the resource. Those processes then compete to get lock. + Note that this scheme is equivalent to the wake- up waiting flag scheme described in Watson's book. It fails if resource is unlocked between time lock is checked and flag is set. + Some processes that cannot be made to sleep (in "bottom half" of Unix kernel). Those processes are prevented from trying to use a locked resource (any resource that is shared between top and bottom of kernel) by raising the priority of the portion of the kernel that is running to prevent the other por- tion of the kernel from running. + Unix - Deadlock + Normally, (kernel?) process which is holding resources and finds that it wants a locked resource must release all of the resources it is holding be- fore it goes to sleep. + In special cases (e.g. going down a directory tree), resources are accessed in order. In that case, there can be no circuit in the resource alloc. graph, and so no sleep occurs. + Unix - File Locking - .19 - + Only one process may have advisory exclusive lock or many processes may have advisory shared lock. Lock requests block if the lock is not available. Be- cause locks are advisory, process may ignore them if desired. + Conditional lock request is possible - i.e. get er- ror return if lock is set, rather than blocking. + Deadlock checking in Unix for file locks is limited. System checks to make sure it doesn't deadlock with itself (i.e. it already holds an exclusive lock). Any application complicated enough to require deadlock detection is required to develop its own. - .20 - Topic: Introduction to Storage Allocation; Linkers + Object code in system such as Unix divided into three parts: + Code ("text" in Unix terminology) + Data + Stack + Why distinguish between different segments of a program? + We will sometimes want to have two or more processes running at the same time run the same code- i.e. share the code, which is possible since code isn't modified. + Data and stack segments must apply to each process, since those are modified. + Dynamic linking. + Separate Compilation + Code is generated by compiler from source code (or from user assembly langugage). + Code contains addresses + Some of the addresses have values not known at com- pilation or assembly time. + Address of base of text segment not known at compile time + Address of base of code not known at assembly time - .21 - + Addresses of other, separately compiled pieces of code not known. + Why Addresses Not Known? + We want the ability to combine code that was com- piled at different times. If we compile everything at the same time, we don't have a problem. + Division of responsibility between various portions of system: + Compiler: generates one object file for each source code file containing information for that file. In- formation is incomplete, since each source file gen- erally references some things defined in other source files. + Compiler provides the symbol table and reloca- tion table, explained below. + Linker or Linkage-Editor: combines all of the object files for one or more programs into a single object file, which is complete and self-sufficient. (I.e. linker can go from many object files to just one.) + Loader - Takes single object file and adjusts all addresses to reflect correct load address for object file. + Note - terms "linker" and "loader" are often used interchangably. We shall generally do so here. Linker often includes function of loader. - .22 - + Operating system: places object files into memory, allows several different processes to share memory at once, provides facilities for processes to get more memory after they've started running. + Run-time library: works together with OS to provide dynamic allocation routines, such as calloc and free in C. + What addresses are there in code? + There are absolute addresses within a code segment. (E.g. JMP Label). The values of these addresses need to be determined. + When code is compiled or assembled, these ad- dresses are usually set as if segment is loaded at zero. + There are relative addresses (e.g. JMP *+28) - no problem. + There are external addresses - e.g. CALL SUBPROG. We also need those addresses. + There are addresses of data in the data segment. + When Compiling (we will ignore assemblers, which are similar), we create: + Segment Table - for each segment, we need the seg- ment name, segment size, and the base address at which it was assumed to be loaded: - .23 - + [Segment Name, Segment Size, Nominal Base] + Symbol table - contains global definitions - has table of labels that are needed in other segments. Usually, these have been declared in some way; internal labels are not externally visible. (Inter- nal labels were known by the compiler anyway.) + [symbol, segment_name, offset from base of seg- ment] + Relocation Table - table of addresses within this segment that need to be fixed, i.e. relocated. Con- tains internals - references to locations within this segment, and external references - references that are believed to be external. (I.e. we didn't find them here.) + [address location, symbol name, offset to symbol (i.e. offset to address), length of address field] + Compiler provides these tables along with the object code for each segment. + Effectively, there are 3 steps in a Linker/Loader: + Determine location of each segment + Calculate values of symbols and update symbol table + Scan relocation table and relocate addresses. + In more detail: Operation of a Linker: + Collect all the pieces of a program, - .24 - + This may involve finding some segments from the file system and libraries. + Assign each segment a final location. Build the segment table. + Resolve (i.e. fix) all of the addresses that can be fixed. Result is a new object file. All addresses may be resolved, or there may be a new object file with some addresses still unresolved. + This is done by taking the symbol table for each segment and assigning to each symbol its new ad- dress. + If the linker has been given an absolute ad- dress for loading, then absolute address is calculated. + If linker is to produce new relocatable ob- ject module, then absolute address relative to zero is calculated, and new symbol table is output. + Relocation table is scanned, and for every entry in the code, value is replaced with new absolute value. + If the linker has been given an absolute ad- dress for loading, then absolute address is calculated. + If linker is to produce new relocatable ob- ject module, then absolute address relative to zero is calculated, and new relocation - .25 - table is output. - .26 - Topic: Dynamic Storage Allocation + Dynamic Storage Allocation was covered in previous courses. + Why isn't static allocation sufficient for everything? Unpredictability: can't predict ahead of time how much memory, or in what form, will be needed: + Recursive procedures. Even regular procedures are hard to predict (data dependencies). + Complex data structures, e.g. linker symbol table. If all storage must be reserved in advance (stati- cally), then it will be used inefficiently (enough will be reserved to handle the worst possible case). + For example, OS doesn't know how many jobs there will be or which programs will be run. + Need dynamic memory allocation both for main memory and for file space on disk. + Two basic operations in dynamic storage management: + Allocate + Free. + Dynamic allocation can be handled in one of two general ways: - .27 - + Stack allocation (hierarchical): restricted, but simple and efficient. + "Heap" allocation: more general, but less effi- cient, more difficult to implement. (I.e. uses free storage area.) + Stack organization: Memory allocation and freeing are partially predictable (as usual, we do better when we can predict the future). Allocation is hierarchical: memory is freed in opposite order from allocation. If alloc(A) then alloc(B) then alloc(C), then it must be free(C) then free(B) then free(A). + Example: procedure call. X calls Y calls Y again. Space for local variables and return addresses is allocated on a stack. + Stacks are also useful for lots of other things: tree traversal, expression evaluation, top-down re- cursive descent parsers, etc. A stack-based organization keeps all the free space to- gether in one place. + Heap organization: allocation and release are unpredictable. Heaps are used for arbitrary list struc- tures, complex data organizations. Example: payroll system. Don't know when employees will join and leave the company, must be able to keep track of all them us- ing the least possible amount of storage. - .28 - + Memory consists of allocated areas and free areas (or holes). Inevitably end up with lots of holes. Goal: reuse the space in holes to keep the number of holes small, their size large. + Fragmentation: inefficient use of memory due to holes that are too small to be useful. In stack al- location, all the holes are together in one big chunk. + Internal - space is wasted within blocks + External - space wasted between blocks. + Typically, heap allocation schemes use a free list to keep track of the storage that is not in use. Algorithms differ in how they manage the free list. + Best fit: keep linked list of free blocks, search the whole list on each allocation, choose block that comes closest to matching the needs of the alloca- tion, save the excess for later. During release operations, merge adjacent free blocks. + First fit: just scan list for the first hole that is large enough. Free excess. Also merge on releases. Most first fit implementations are rotat- ing first fit. (Next fit.) + Next Fit: like first fit, but start where you left off. + Best fit is not necessarily better than first fit. Suppose memory contains 2 free blocks of size 20 and 15. - .29 - + Suppose allocation ops are 10 then 20: which approach wins? + Suppose ops are 8, 12, then 12: which one wins? + First fit tends to leave ``average'' size holes, while best fit tends to leave some very large ones, some very small ones. The very small ones can't be used very easily. + Knuth claims that if storage is close to running out, it will run out regardless of which scheme is used, so pick easiest or most efficient scheme (first fit). + Bit Map: used for allocation of storage that comes in fixed-size chunks (e.g. disk blocks, or 32-byte chunks). Keep a large array of bits, one for each chunk. If bit is 0 it means chunk is in use, if bit is 1 it means chunk is free. Will be discussed more when talking about file systems. + Pools: keep a separate allocation pool for each popular size. Allocation is fast, no fragmentation. + Reclamation Methods: how do we know when dynamically- allocated memory can be freed? + It's easy when a chunk is only used in one place. + Reclamation is hard when information is shared: it can't be recycled until all of the sharers are fin- ished. Sharing is indicated by the presence of - .30 - pointers to the data. Without a pointer, can't ac- cess (can't find it). + Two problems in reclamation: + Dangling pointers: better not recycle storage while it's still being used. + Core leaks: Better not ``lose'' storage by forget- ting to free it even when it can't ever be used again. + Reference Counts: keep track of the number of outstand- ing pointers to each chunk of memory. When this goes to zero, free the memory. Example: Smalltalk, file descriptors in Unix. Works fine for hierarchical struc- tures. The reference counts must be managed carefully (by the system) so no mistakes are made in incrementing and decrementing them. + What happens when there are circular structures? + Garbage Collection: storage isn't freed explicitly (us- ing free operation), but rather implicitly: just delete pointers. When the system needs storage, it searches through all of the pointers (must be able to find them all!) and collects things that aren't used. (Marking Algorithms) If structures are circular then this is the only way to reclaim space. Makes life easier on the ap- plication programmer, but garbage collectors are incred- - .31 - ibly difficult to program and debug, especially if com- paction is also done. Examples: Lisp, capability sys- tems. + How does garbage collection work? + Must be able to find all objects. + Must be able to find all pointers to objects. + Pass 1: mark. Go through all statically-allocated and procedure-local variables, looking for pointers. Mark each object pointed to, and recursively mark all objects it points to. The compiler has to cooperate by saving information about where the pointers are within structures. + Pass 2: sweep. Go through all objects, free up those that aren't marked. + Garbage collection is often expensive: 20% or more of all CPU time in systems that use it. + Buddy System - .32 -