LECTURE 4 2005-01-31 Topics: SCHEDULING INDEPENDENT AND COOPERATING PROCESSES Plus: DISCRETE EVENT SIMULATION ANNOUNCEMENTS ============= Second part of reader now available. It's blue, thick, and about $30. It's a bunch of technical papers that back up material in the reading SCHEDULING ========== FOREGROUND/BACKGROUND SCHEDULING A foreground/background scheduler has two queues for tasks to be placed in, not surprisingly called the foreground queue and the background queue. Tasks placed in the foreground queue are given priority over tasks in the background queue. A simple implementation of a foreground/background queue would look something like this: .---------------------. | ___FG__ | V ---->|_|_|_|_|--->(_)---> ---| _______ ^ ---->|_|_|_|_|-----' BG We can see that this system consists of two queues, both of which feed into the CPU. Also note that items can be stalled in the CPU and placed back on either the foreground or background queue. Any scheduling algorithm can be used to schedule individual queues, but generally a time slice algorithm is used. Jobs that have run for a while will be returned to one of the queues. The assignment to a queue (priority) can be assigned in many different ways, depending on which aspects we think make it important (ie deserving a higher priority status). Here are several examples of characteristics we could use to determine which queue a task should be placed in: Interactive / Batch -- We assume that "interactive" tasks (involving keyboard input from the user) are higher priority, with the understanding that a user sitting at a keyboard will want tasks that they are giving input to to have a higher priority. fg / bg (UNIX) -- fg and bg are UNIX commands that allow a user to specify a task as running in the foreground or background. If UNIX were using a FB scheduler, it would make sense to put things that the user specified as foreground tasks in the foreground and vice versa. IO intensive / computation intensive -- We assume that things requiring a lot of IO to be higher priority. New / Long Running -- We assume that new tasks should be given higher priority. New tasks are likely to complete sooner, and are likely to reflect what the user is trying to do now, and is most interested in having completed. Deadline / Non-Deadline -- If we are aware of deadlines associated with tasks (eg if we are running a pay for CPU cycles facility) then we could consider tasks with a known deadline to be of a higher priority. Q: What's the difference between fg and bg? A: Jobs in the foreground are given priority over jobs in the background. We assign jobs to these queues in a way that will improve performance. When the processor becomes available, we first try to give it a job from the foreground queue, and if it is empty, we try to give it a job from the background queue. MULTI-LEVEL FOREGROUND / BACKGROUND Multilevel FGBG (MLFB) is a lot like FB, except that it has more than two queues. Such an implementation would look something like this: .------------------. | _____ | V---|_|_|_|------>(_)---> | _____ ^ |---|_|_|_|--------| | _____ | |---|_|_|_|--------| ... ... --| _____ | '---|_|_|_|--------' We can see that this implementation has several different queues. Items are placed into queue based on their priority. EXPONENTIAL QUEUE An exponential queue (aka multi-level feedback queue) is a special type of MLFB. Higher priority queues are assigned very short time slices. If an item in a high priority queue uses up the slice without blocking, we send it to a lower priority queue. As the priority decreases, the length of the time-slice for that priority queue increases exponentially (eg priority 0 (highest priority) has a time-slice of length q, priority 1 (next highest priority) has a time slice of length 2q, priority 2 has a slice length of 4, and so on, with the length of the time slice doubling each time we drop a priority level) This behavior decreases overhead. Longer running processes are put into lower priority queues, which have longer quanta. This idea is that once we put a known long-running task into the in-memory queue, we let it run a while to minimize the overhead incurred by switching tasks. * EXAMPLE Consider two systems, one with a constant time slice length of 1, and another system using an exponential queue with a time slice of length .5 for highest priority tasks, and doubling time slice length for each drop in priority level. Consider the following tasks and service times: Task Switches Task Service Time System 1 System 2 ------------------------------------------------------------------------------ A 0.5 1 1 (quanta = .5) B 2.0 2 2 (quanta = .5, 1, 2) C 10.0 10 5 (quanta = .5, 1, 2, 4, 8) D 100.0 100 8 (quanta = .5, 1, 2, ... 32, 64) [Note: The last column shows the time slice lengths of each level that a task moves through as it exhausts its current quantum and gets moved down to the next queue] We can see that System 2 would only incur the task switching overhead 8 times when running a task with service length 100, instead of the 100 switches System 1 would do. We also see that that it performs comparably to the constant service time system for short tasks. The CRSS system (MIT, early 1960's) was the first to use exponential queues. It used 4 queues. EXTENSIONS You could further adjust these systems by allowing processes to gain merit if it has waited a long time. Something like this was used in TSO, and in general this approach helps to avoid starvation, because low-priority tasks will eventually be given attention. You could also implement (or at least, you can imagine that someone could implement) an adaptive system, one which would change priority of tasks based on their run time behavior. This is sometimes called "Fair Share scheduling". 1. Keep history of recent CPU usage for each process 2. Give highest priority to process that has used least CPU time recently. Highly interactive jobs, like vi will use very little CPU time and get high priority. CPU bound jobs, like compiles would get lower priority. This would lead to a formula for priority something like this: priority = wait_time - K_i * cpu_time To influence the priority assigned to a task, we could change its K. Then its use of the CPU would take longer to change its priority. In general, we would promote jobs as they waited, and demote them as they used CPU time. BSD UNIX SCHEDULING BSD uses multilevel feedback queues. There are 128 (!) priority levels, but processes are grouped into 4 levels, so the system can be implemented with 32 queues. System runs jobs at the front of the highest priority occupied queue. The job is run for quantum, and when it exhausts its quantum, it is put back on the same queue. The quantum for 4.3 BSD is .1 seconds, which was found to be longest time that doesn't produce a "jerky" response. A higher priority process is only started when that quantum of current process ends (except in the case of a sleeping process that wakes up and preempts running process after a system call). DEC UNIX uses a similar scheduler. In general the priority of a process is found with this sort of formula: User priority = PUSER + PCPU + PNICE PUSER - Based on type of user, eg OS process are higher priority PCPU - Weighted load factor that increases as te process uses CPU time (Uses decay filter) PNICE - The number used to "nice" (reduce priority) a job Priority levels 0-49 (the highest priority) are reserved for system processes. 50-127, lower priorities, are for user processes. Q: Do these complicated algorithms like this increase overhead? A: No, it's not that complicated. Of course, the problem with implementing these complicated systems is that it is harder to "get it right", and it's likely that you (as smart as you are) will make a mistake. In general, it is wise to adhere to KISS (Keep it simple, stupid). Q: Does BSD use exponential? A: No. That would be a lot of doubling, with so many queues! An interesting hypothesis - DEC Unix uses very large quanta for workstations to make them unusable for timesharing. Because workstations cost so much less than super computers, it would seem to make sense to buy workstations and timeshare them. However, the professor noticed that whenever students logged into his workstation, things would get really slow. This led the professor to hypothesize that DEC intentionally chose a large quantum that would discourage users from timesharing their workstations. BEATING THE SCHEDULER Most successful scheduling algorithms assume that there are lots of short jobs, and a few long jobs. They generally give priority to short jobs. So ... how do you beat the scheduling algorithm? A general approach is to make long jobs look like short jobs. (eg, one big compile is broken into many small compiles; one big troff becomes many small troffs) Or, if you know that the OS promotes jobs that are IO intensive, you could do a lot of (spurious) IO. (eg, writing to /dev/null) This would result in promotion, even if you aren't _really_ an IO intensive process. However, such behavior is considered antisocial - don't do it! SUMMARY: In principle, scheduling algorithms can be arbitrary, since the system should produce the same results in any event. Generally the best schemes are adaptive and preemptive (ie we prevent processes from running really long time by switching to another task) To do absolutely best, we'd have to be able to predict the future Algorithms have strong effects on system's overhead, throughput and response time. Best scheduling algorithms tend to give highest priority to processes that need the least. DISCRETE EVENT SIMULATION ========================= (Useful for the homework that was already turned in...) Keep and Event List - list of things that are scheduled to happen The loop: Get Event Update Stats Update State Update Event List Repeat Q: How do was sort the events? A: You don't really need to sort, just need to be able to get the next event. There are simulation packages that will take care of a lot of grunt work, but these could run really slow Q: Do you expect that every simulation will have the same results? A: Yes, if it weren't for corner cases. The FAQs attempt to answer a reasonable number of those, but how you deal with those little cases will introduce small changes in your results. However, unless you're doing things really strange, you shouldn't be off by more that 1 or 2%. Feel free to compare with someone else, if you're radically different, one or both of you are wrong. Q: Can we post our results on the news group to see what other people got? A: It's ok. I guess. When you're doing something like a simulation, you may want to write your own input files, so you can do it by hand on small cases, to make sure that you can at least get easy cases. You normally have to do a few cases by hand. Obvious things, like adding overhead should make things slower. The fact that your program gives you some numbers doesn't mean that they are correct. This is the same thing the professor does with a research paper; he'll check a few cases by hand. Q: What should we read? A: If I don't assign it, you don't have to read it. The purpose of the book is to look things up that wasn't clear in lecture. INDEPENDENT AND COOPERATING PROCESSES ====================================== INDEPENDENT PROCESSES An independent process is one that can't affect of be affected by the rest of the universe. (As we'll soon see, few processes are really completely independent) Its state isn't shared in any way by any other process (This is the major requirement for being independent) It is both deterministic and reproducible: input state alone determines results. Can stop and restart with no bad effects (only time varies) Example: program that sums the integers from 1 to 100 How often are processes independent? If a process shares a file with another process, it isn't independent If one uses input from another, or generates output for another, it isn't If they conflict in the resources they want to use (printer, disk, tape) they aren't If they're part of the same command, they probably aren't Independent processes are interesting and useful from the viewpoint of scheduling. In a real system, however, there is usually a lack of complete independence. We are therefore going to concentrate (for a couple of lectures) on "cooperating processes" Cooperation doesn't mean they are working together, rather that they are somehow interacting. COOPERATING PROCESSES Cooperating processes are those that share some state. They may or may not be actually "cooperating" We are interested in the results of the computation, ie if we want the sum of N numbers, the sum should always be computer correctly. Relative and absolute timing is not part of the "results" and can change from run to run. We are therefore interested in reproducible results, despite the fact that some aspects of machine operation are non-deterministic. (eg run times and relative interleaving of processes varies.) But suppose that we start with a quiescent machine, and the same starting conditions - won't the same things always happen?? No, for things to be _exactly the same, the following would have to be true: If it does IO, it's difficult that as the disk rotates that it will be at the same angular position. If it wrote to disk, you'd have to have the exact same free-lists Requires that disk layout be the same May require that system clock be same value (random numbers) All electronic gates switch in the same time, otherwise order of two processors could change Essentially, there is a distinction between micro level and macro level behavior. For practical purposes, exact micro behavior is irreproducible. Macro behavior on the other hand should be reproducible - ie the computation results should be the same Micro is at level of gates, sequence of memory references at the CPU, etc Macro level behavior - results of computation It is important that the results of computations be reproducible, despite considerable shared state, such as the file system Why permit processes to cooperate? Want to share resources One computer, many users One file of checking account record, many tellers. (What if there were separate account for each teller? Could withdraw many times from different tellers) Want to do things faster Read next block while processing current one Divider job into sub-jobs, execute in parallel Want to construct systems in modular fashion (eg eqn | bl | troff) EXAMPLE: A counting contest - how we can get in trouble: Program A: i=0 while (i<10) i = i + 1 printf("A is done"); Program B: i=0 while(i>-10) i = i - 1; printf("B is done" Variable i is shared. Reference (read) and assignment and are each atomic. Will process A or B win? Don't know! In case of hyper-threading, possible that neither will ever finish. (i will keep getting incremented and decremented, and will never get all the way to 10 or -10) Note that even if they run exactly at the same speed, one might still win. Note the effect of readA readB writeB writeA If one finishes, will the other finish? (it should) Does it help A to get a head start? (Depends on exact sequence. The sequence of read, read, write, write allows B to win) Keep in mind, that when we talk about scheduling, synchronization, we assume the worst case (multiple processors running in arbitrary order) When discussing concurrent processes, multprogramming (multiple processes running on same processor concurrently) is usually as dangerous as multiprocessing (multiple processors, sharing memory) unless you have tight control over the multiprogramming, Also, smart IO devices are as bad as cooperating processes (they share memory) In general, we can assume tat at any instant, if two processes are ready, the processor can switch from running one to running the other (on an instruction by instruction basis) Only truly indivisible (atomic) operations cannot be interrupted ATOMIC OPERATIONS In order to synchronize parallel processes, we will need atomic operations An atomic operation is an operation that either happens in its entirety without interruption, or not at all. Cannot be interrupted in the middle Loads and assignments (stores) are atomic in almost all systems. A=B will always get a good value for B will always set a good value for A As long as you have at least one atomic operation, you can use it to build up more complex atomic "instructions" for that are easier to use. EXAMPLE: Have some room mates, both like milk. We want to design an algorithm to make sure fridge stays stocked. The fundamental problem: Person A looks, sees no milk, goes to store Person B looks, sees no milk, goes to store Person A arrives, puts milk in fridge Person B arrives, puts milk in fridge. Say "Too much milk!" Attempt 1: A & B if(no milk) if(no note) (leave note buy milk remove note)) Bug: person A checks for note, but before they leave note, person B checks for note. Both will buy milk! We need mutual exclusion: only one person or process is doing something at a time (eg, only one person goes shopping at a time) The thing that we are trying to get exclusion for is a critical section (eg shopping) The critical section is the code that manipulates the shared state. We usually get this by some locking mechanism. In this case, the note was an attempt at a locking mechanism. We want to get the lock before we go shopping, and remove lock when done, and not ignore lock when it is set. In general, synchronization problems are subtle. Attempt 2: A if(no note) if(no milk) (buy milk leave note) B if(note) if(no milk) (buy milk leave note) Bug: This works fine as long as A doesn't go on vacation or something, which would leave B unable to get milk Attempt 3: A (leave note a) if(no note b) if(no milk) (buy milk remove note a) B (leave note b) if(no note a) if(no milk) (buy milk remove note b) Bug: This would be bad if both A and B left notes, because then they would remove notes without getting milk. You could possible get stuck in a loop where both A and B place and remove notes without every buying more milk. Attempt 4: A (Leave note A) while(note B) wait if(no milk) (Buy milk Remove note A) B (Leave note B) if(no note A) if(no milk) (buy milk remove note b) This would actually work. However, A is not being efficient. (He just sits in front of the fridge all day) If B goes to store, and stays out (or dies), then A is stuck there waiting. Finally, if you have third room mate, this system falls apart.