LECTURE 4 
2005-01-31

Topics:
SCHEDULING
INDEPENDENT AND COOPERATING PROCESSES

Plus: DISCRETE EVENT SIMULATION 


ANNOUNCEMENTS
=============

Second part of reader now available.  It's blue, thick, and about $30.  It's a
bunch of technical papers that back up material in the reading

SCHEDULING
==========

FOREGROUND/BACKGROUND SCHEDULING

A foreground/background scheduler has two queues for tasks to be placed in, not
surprisingly called the foreground queue and the background queue.  Tasks placed
in the foreground queue are given priority over tasks in the background queue.

A simple implementation of a foreground/background queue would look something
like this:

 .---------------------.
 |        ___FG__      |
 V  ---->|_|_|_|_|--->(_)--->
---|      _______      ^
    ---->|_|_|_|_|-----'
             BG

We can see that this system consists of two queues, both of which feed into
the CPU.  Also note that items can be stalled in the CPU and placed back on
either the foreground or background queue.

Any scheduling algorithm can be used to schedule individual queues, but
generally a time slice algorithm is used.  Jobs that have run for a while will
be returned to one of the queues.

The assignment to a queue (priority) can be assigned in many different ways,
depending on which aspects we think make it important (ie deserving a higher
priority status).  Here are several examples of characteristics we could use
to determine which queue a task should be placed in:

  Interactive / Batch -- We assume that "interactive" tasks (involving keyboard
    input from the user) are higher priority, with the understanding that a
    user sitting at a keyboard will want tasks that they are giving input to
    to have a higher priority.

  fg / bg (UNIX) -- fg and bg are UNIX commands that allow a user to specify
    a task as running in the foreground or background.  If UNIX were using a
    FB scheduler, it would make sense to put things that the user specified 
    as foreground tasks in the foreground and vice versa.

  IO intensive / computation intensive -- We assume that things requiring a lot
    of IO to be higher priority.  

  New / Long Running -- We assume that new tasks should be given higher 
    priority.  New tasks are likely to complete sooner, and are likely to 
    reflect what the user is trying to do now, and is most interested in
    having completed.

  Deadline / Non-Deadline -- If we are aware of deadlines associated with tasks
    (eg if we are running a pay for CPU cycles facility) then we could consider
    tasks with a known deadline to be of a higher priority.


Q: What's the difference between fg and bg?
A: Jobs in the foreground are given priority over jobs in the background.  We
assign jobs to these queues in a way that will improve performance.  When the
processor becomes available, we first try to give it a job from the foreground
queue, and if it is empty, we try to give it a job from the background queue.

MULTI-LEVEL FOREGROUND / BACKGROUND

Multilevel FGBG (MLFB) is a lot like FB, except that it has more than two 
queues.  Such an implementation would look something like this:

  .------------------.
  |    _____         |
  V---|_|_|_|------>(_)--->
  |    _____         ^
  |---|_|_|_|--------|
  |    _____         |
  |---|_|_|_|--------|
  ...               ...
--|    _____         |
  '---|_|_|_|--------'

We can see that this implementation has several different queues.  Items are
placed into queue based on their priority.

EXPONENTIAL QUEUE

An exponential queue (aka multi-level feedback queue) is a special type of 
MLFB.  

Higher priority queues are assigned very short time slices.  If an item in a
high priority queue uses up the slice without blocking, we send it to a lower
priority queue.  As the priority decreases, the length of the time-slice for
that priority queue increases exponentially (eg priority 0 (highest priority) 
has a time-slice of length q, priority 1 (next highest priority) has a time 
slice of length 2q, priority 2 has a slice length of 4, and so on, with the 
length of the time slice doubling each time we drop a priority level)

This behavior decreases overhead.  Longer running processes are put into lower
priority queues, which have longer quanta.  This idea is that once we put a
known long-running task into the in-memory queue, we let it run a while to
minimize the overhead incurred by switching tasks.

* EXAMPLE

Consider two systems, one with a constant time slice length of 1, and another
system using an exponential queue with a time slice of length .5 for highest 
priority tasks, and doubling time slice length for each drop in priority level.

Consider the following tasks and service times:

                              Task Switches
Task       Service Time     System 1  System 2
------------------------------------------------------------------------------
 A            0.5             1          1 (quanta = .5)
 B            2.0             2          2 (quanta = .5, 1, 2)
 C           10.0            10          5 (quanta = .5, 1, 2, 4, 8)
 D          100.0           100          8 (quanta = .5, 1, 2, ... 32, 64)

[Note: The last column shows the time slice lengths of each level that a task
moves through as it exhausts its current quantum and gets moved down to the
next queue]

We can see that System 2 would only incur the task switching overhead 8 times
when running a task with service length 100, instead of the 100 switches System
1 would do.  We also see that that it performs comparably to the constant service
time system for short tasks.

The CRSS system (MIT, early 1960's) was the first to use exponential queues.  
It used 4 queues.

EXTENSIONS

You could further adjust these systems by allowing processes to gain merit if
it has waited a long time.  Something like this was used in TSO, and in general
this approach helps to avoid starvation, because low-priority tasks will 
eventually be given attention.

You could also implement (or at least, you can imagine that someone could 
implement) an adaptive system, one which would change priority of tasks based
on their run time behavior.  This is sometimes called "Fair Share scheduling".

1.  Keep history of recent CPU usage for each process
2.  Give highest priority to process that has used least CPU time recently.  
    Highly interactive jobs, like vi will use very little CPU time and get high 
    priority.  CPU bound jobs, like compiles would get lower priority.

This would lead to a formula for priority something like this:

  priority  = wait_time - K_i * cpu_time

To influence the priority assigned to a task, we could change its K.  Then its
use of the CPU would take longer to change its priority.  In general, we would
promote jobs as they waited, and demote them as they used CPU time.

BSD UNIX SCHEDULING

BSD uses multilevel feedback queues.  There are 128 (!) priority levels, but 
processes are grouped into 4 levels, so the system can be implemented with 32 
queues.

System runs jobs at the front of the highest priority occupied queue.  The job 
is run for quantum, and when it exhausts its quantum, it is put back on the
same queue.

The quantum for 4.3 BSD is .1 seconds, which was found to be longest time that 
doesn't produce a "jerky" response.  A higher priority process is only started 
when that quantum of current process ends (except in the case of a sleeping 
process that wakes up and preempts running process after a system call).  DEC 
UNIX uses a similar scheduler.

In general the priority of a process is found with this sort of formula:

  User priority = PUSER + PCPU + PNICE

  PUSER - Based on type of user, eg OS process are higher priority
  PCPU  - Weighted load factor that increases as te process uses CPU time 
          (Uses decay filter)
  PNICE - The number used to "nice" (reduce priority) a job

Priority levels 0-49 (the highest priority) are reserved for system processes.  
50-127, lower priorities, are for user processes.

Q: Do these complicated algorithms like this increase overhead?
A: No, it's not that complicated.  Of course, the problem with implementing 
these complicated systems is that it is harder to "get it right", and it's 
likely that you (as smart as you are) will make a mistake.  In general, it is
wise to adhere to KISS (Keep it simple, stupid).

Q: Does BSD use exponential?
A: No. That would be a lot of doubling, with so many queues!

An interesting hypothesis - DEC Unix uses very large quanta for workstations to 
make them unusable for timesharing.  Because workstations cost so much less than
super computers, it would seem to make sense to buy workstations and timeshare 
them.  However, the professor noticed that whenever students logged into his 
workstation, things would get really slow.  This led the professor to 
hypothesize that DEC intentionally chose a large quantum that would discourage 
users from timesharing their workstations.

BEATING THE SCHEDULER

Most successful scheduling algorithms assume that there are lots of short jobs, 
and a few long jobs.  They generally give priority to short jobs.

So ... how do you beat the scheduling algorithm?

A general approach is to make long jobs look like short jobs.  (eg, one 
big compile is broken into many small compiles; one big troff becomes many 
small troffs)

Or, if you know that the OS promotes jobs that are IO intensive, you could do
a lot of (spurious) IO.  (eg, writing to /dev/null)  This would result in 
promotion, even if you aren't _really_ an IO intensive process.

However, such behavior is considered antisocial - don't do it!

SUMMARY:

In principle, scheduling algorithms can be arbitrary, since the system should 
produce the same results in any event.

Generally the best schemes are adaptive and preemptive (ie we prevent processes 
from running really long time by switching to another task)

To do absolutely best, we'd have to be able to predict the future

Algorithms have strong effects on system's overhead, throughput and response 
time.

Best scheduling algorithms tend to give highest priority to processes that need 
the least.



DISCRETE EVENT SIMULATION 
=========================
(Useful for the homework that was already turned in...)

Keep and Event List - list of things that are scheduled to happen

The loop: 
   Get Event
   Update Stats
   Update State
   Update Event List
   Repeat

Q: How do was sort the events?
A: You don't really need to sort, just need to be able to get the next event.
There are simulation packages that will take care of a lot of grunt work, but 
these could run really slow

Q: Do you expect that every simulation will have the same results?
A: Yes, if it weren't for corner cases.  The FAQs attempt to answer a 
reasonable number of those, but how you deal with those little cases will 
introduce small changes in your results.  However, unless you're doing things 
really strange, you shouldn't be off by more that 1 or 2%.  Feel free to 
compare with someone else, if you're radically different, one or both of you 
are wrong.

Q: Can we post our results on the news group to see what other people got?
A: It's ok.  I guess.

When you're doing something like a simulation, you may want to write your own 
input files, so you can do it by hand on small cases, to make sure that you can
at least get easy cases.  You normally have to do a few cases by hand.  Obvious
things, like adding overhead should make things slower.  The fact that your 
program gives you some numbers doesn't mean that they are correct.  This is the
same thing the professor does with a research paper; he'll check a few cases by
hand.

Q: What should we read?
A: If I don't assign it, you don't have to read it.  The purpose of the book is 
to look things up that wasn't clear in lecture.

INDEPENDENT AND COOPERATING PROCESSES
======================================

INDEPENDENT PROCESSES

An independent process is one that can't affect of be affected by the rest of 
the universe.  (As we'll soon see, few processes are really completely 
independent)

Its state isn't shared in any way by any other process (This is the major 
requirement for being independent)

It is both deterministic and reproducible: input state alone determines results.

Can stop and restart with no bad effects (only time varies) 
Example: program that sums the integers from 1 to 100

How often are processes independent?

If a process shares a file with another process, it isn't independent

If one uses input from another, or generates output for another, it isn't

If they conflict in the resources they want to use (printer, disk, tape) they aren't

If they're part of the same command, they probably aren't

Independent processes are interesting and useful from the viewpoint of 
scheduling.  In a real system, however, there is usually a lack of complete 
independence. We are therefore going to concentrate (for a couple of lectures) 
on "cooperating processes"  Cooperation doesn't mean they are working together,
rather that they are somehow interacting.

COOPERATING PROCESSES

Cooperating processes are those that share some state.  They may or may not be 
actually "cooperating"

We are interested in the results of the computation, ie if we want the sum of N 
numbers, the sum should always be computer correctly.  Relative and absolute 
timing is not part of the "results" and can change from run to run.

We are therefore interested in reproducible results, despite the fact that some 
aspects of machine operation are non-deterministic.  (eg run times and relative 
interleaving of processes varies.)

But suppose that we start with a quiescent machine, and the same starting 
conditions - won't the same things always happen??

No, for things to be _exactly the same, the following would have to be true:

  If it does IO, it's difficult that as the disk rotates that it will be at 
  the same angular position.

  If it wrote to disk, you'd have to have the exact same free-lists

  Requires that disk layout be the same

  May require that system clock be same value (random numbers)

  All electronic gates switch in the same time, otherwise order of two 
  processors could change

Essentially, there is a distinction between micro level and macro level 
behavior.  For practical purposes, exact micro behavior is irreproducible.  
Macro behavior on the other hand should be reproducible - ie the computation 
results should be the same

Micro is at level of gates, sequence of memory references at the CPU, etc

Macro level behavior - results of computation

It is important that the results of computations be reproducible, despite 
considerable shared state, such as the file system

Why permit processes to cooperate?

  Want to share resources 
    One computer, many users
    One file of checking account record, many tellers.  (What if there were 
      separate account for each teller?  Could withdraw many times from 
      different tellers)

  Want to do things faster
    Read next block while processing current one
    Divider job into sub-jobs, execute in parallel

  Want to construct systems in modular fashion (eg eqn | bl | troff)

EXAMPLE:

A counting contest - how we can get in trouble:

Program A:
i=0
while (i<10) i = i + 1
printf("A is done");

Program B:
i=0
while(i>-10) i = i - 1;
printf("B is done"

Variable i is shared.
Reference (read) and assignment and are each atomic.

Will process A or B win?

Don't know!
In case of hyper-threading, possible that neither will ever finish.  (i will 
keep getting incremented and decremented, and will never get all the way to 10
or -10)

Note that even if they run exactly at the same speed, one might still win.  
Note the effect of readA readB writeB writeA

If one finishes, will the other finish? 
  (it should)

Does it help A to get a head start? 
  (Depends on exact sequence.  The sequence of read, read, write, write allows 
   B to win)

Keep in mind, that when we talk about scheduling, synchronization, we assume 
the worst case (multiple processors running in arbitrary order)

When discussing concurrent processes, multprogramming (multiple processes 
running on same processor concurrently) is usually as dangerous as 
multiprocessing (multiple processors, sharing memory) unless you have tight 
control over the multiprogramming,  Also, smart IO devices are as bad as 
cooperating processes (they share memory)

In general, we can assume tat at any instant, if two processes are ready, the 
processor can switch from running one to running the other (on an instruction 
by instruction basis)  Only truly indivisible (atomic) operations cannot be 
interrupted

ATOMIC OPERATIONS

In order to synchronize parallel processes, we will need atomic operations

An atomic operation is an operation that either happens in its entirety without 
interruption, or not at all.  Cannot be interrupted in the middle

Loads and assignments (stores) are atomic in almost all systems.  A=B will 
always get a good value for B will always set a good value for A

As long as you have at least one atomic operation, you can use it to build up 
more complex atomic "instructions" for that are easier to use.


EXAMPLE:

Have some room mates, both like milk.  We want to design an algorithm to make 
sure fridge stays stocked.

The fundamental problem:

Person A looks, sees no milk, goes to store
Person B looks, sees no milk, goes to store
Person A arrives, puts milk in fridge
Person B arrives, puts milk in fridge.  Say "Too much milk!"

Attempt 1:

  A & B

    if(no milk)
    if(no note)
      (leave note
       buy milk
       remove note))

  Bug: person A checks for note, but before they leave note, person B checks 
  for note.  Both will buy milk!

  We need mutual exclusion: only one person or process is doing something at a 
  time (eg, only one person goes shopping at a time)  The thing that we are 
  trying to get exclusion for is a critical section (eg shopping)  

  The critical section is the code that manipulates the shared state.  We 
  usually get this by some locking mechanism.  In this case, the note was an 
  attempt at a locking mechanism.  We want to get the lock before we go 
  shopping, and remove lock when done, and not ignore lock when it is set.  In 
  general, synchronization problems are subtle.

Attempt 2:

  A

    if(no note)
      if(no milk)
        (buy milk
         leave note)

  B
    if(note)
      if(no milk)
        (buy milk
         leave note)

  Bug: This works fine as long as A doesn't go on vacation or something, which 
  would leave B unable to get milk


Attempt 3:

  A
    (leave note a)
    if(no note b)
      if(no milk)
        (buy milk
         remove note a)

  B
    (leave note b)
    if(no note a)
      if(no milk)
        (buy milk
         remove note b)

  Bug: This would be bad if both A and B left notes, because then they would 
  remove notes without getting milk.  You could possible get stuck in a loop
  where both A and B place and remove notes without every buying more milk.

Attempt 4:

  A
    (Leave note A)
    while(note B) wait
    if(no milk)
      (Buy milk
       Remove note A)

  B
    (Leave note B)
    if(no note A)
      if(no milk)
      (buy milk
       remove note b)

  This would actually work.

  However, A is not being efficient.  (He just sits in front of the fridge all
  day)

  If B goes to store, and stays out (or dies), then A is stuck there waiting.

  Finally, if you have third room mate, this system falls apart.