Lecture Notes for Wednesday, 4/13/2005
by Kun Yo Chiu & Kan Shun Sit

/*

Announcement:
Exams not ready yet, expected to be handed back on Monday.

*/


  The  mechanisms described above form the basis for tying

  together distributed systems.  So far,  though,  they've

  only been used for loose coupling:

        Each machine is completely autonomous:  separate ac-

        counting, separate file  system,  separate  password

        file, etc.

        Can send mail between machines.

        Can  transfer  files between machines (but only with

         special commands).

        Can execute commands remotely.

        Can login remotely.

   Loose coupling like this is OK for a network with only a

   few  machines spread all over the country, but not for a

   high-performance LAN where every user has a private  ma-

   chine.


   What would we like in a distributed computer system?

        Unified, transparent file system.

         Unified,  transparent  computation - from any terminal, 

	  you can  run  on  any  machine,  transparently.

       (You  actually  shouldn't  care which machine you're

        running on.)

        Load Balancing, process migration, file migration.

        /*

        processes can move to where resources are available.  
        Files would move so we get better  performance.  
        You can do this on experimental LANs but otherwise 
        it's not reasonable because of issues like bandwidth.

        */

        Local area networks can more or less provide that now.

        Wide area networks cannot provide this transparenly, due

        to performance problems.  May be possible in future?


  Distributed File Systems

        Remote  files appear to be local (except for perfor-

        mance).

        Issues:

            Failures  -  what  happens  when  remote  system
            
		crashes.


            /* 

            if your machine crashed nothing you can do. 
            but if remote machine crashes you don't really 
            know why.

            */


            Performance  - remote is not same as local.  Can

            do some caching.


  Sun's NFS (Network File System)

       NFS permits the mounting of a remote file system  as

       if  it  were  local.  Therefore, by using mount com-

       mands, can set up transparent distributed file  sys-

       tem.

       /*

        you got big caches on both ends.  write is 
        transmitted to server from your local copy.
	
       */

       Caches  file blocks, descriptors at both clients and

       servers.

          Write-through caching.  When file is closed, all

          modified  blocks  are sent immediately to server

          disk.  "Close" doesn't return  until  all  bytes

          stored on disk.

       Consistency is weak.  Polls periodically for

       changes to file; may use old  version  until

       polling.   May have simultaneous conflicting

       updates.


  Server  keeps  no  state  about  client  (except

  hints, for performance).  Each read gives enough

  info to do entire operation.  (I.e. Read(I#, po-

  sition).


  /* 

  but this is good enough.  all operations are given 
  enough info to do what needs to be done. e.g read a 
  block means read just that block not the next block, 
  this is unambiguous and repeatable. When a server 
  crashes this allows us to pick up where we left off.  
  Any requests  can be repeated with no ill effects, 
  so you can read or write again if something went wrong 
  first time.  

  */


  When  server  crashes  and  restarts,  can start

  again processing requests immediately.

  All requests are "idempotent" - i.e. can be  re-

  peated  with  no ill effects.  So if message may

  be lost, can resend (and possibly redo).


/*

  Caches need to be consistent.  didn't discuss this in 
  61b,c but something in 150,152. Cache consistency is 
  something studied in hardware context.  if you got 
  caching you got the main copy of something and the cache 
  copy. this is ok as long as cache is same as main copy, 
  but if you write the local copy is different from main.  
  
  We'd like everything to be consistent. Remember caching 
  can be local in your cpu, or in a file system, or a 
  physical disk controller too.  


Question: 
what if you have two simultaneous modificatons?

Ansewr: 
This is why you have locking in a multi processor cpu 
you'd lock the bus read/write and everyone knows they 
can't touch it.  
With a distributed system this is a little more complicated.
short discussion about directory protocol, every system 
looks in before they read/write a block and this would work.
But then you bottleneck because of the directory.

*/


 Cache Consistency

     Issue is that in a system with caching, there can be

     many copies of a given piece of data.

        If  any  of  those pieces is written, it becomes

        inconsistent with the other pieces.

     Problem can occur in distributed systems (as well as

     CPU cache memories).

     Goal:  distributed  system  (or  shared file system)

       yields same results as unified, uniprocessor system.

       Caching can occur in disk controller, in SAN or NAS,

       in main memory disk cache, etc.

     Solutions:

        Only one cached copy at a time.

        Very inefficient, if file is  read  only  or

        read mostly.

        Approaches:

            (a) Many read copies

            (b) One write copy

            (c) Update many write copies.

        Several solutions:

             i.  "Open for write" causes all other cached

                copies to be deleted.

             ii. Write  to  a  block  deletes  all  other

                 cached  copies  of  block.   ("write invali-

                  date")

                 ("write through" - update backing  store

                  copy also.)
                
			 Need  to lock block, delete other copies

                   and then update, to avoid  crossing  up-

                   dates.

             iii. Write to block is broadcast to all oth-

                     er  cached  copies  (and   backing   store?)

                     ("write update")

                     Need  to lock all blocks before updating

                       any.

        Optimistic approaches

                 Don't do the locks - hope it works  out.

                 Don't  do  the locks, but keep old copy.

                 Backout if necessary.

                 Leave it to the users  to  worry  about.

                (Unix, Database Systems.)

         Note:  need to know where/how to find all of the

                 copies.

                 In distributed system, there is no way to do

                 "snoopy"  coherence.   I.e.  can't  see  all

                 block and file requests.

                 System "owning" file (usually the  processor

                 in  the  system  where  the file lives) must

                 keep track of who has copies.

                 If broadcast-writes is  used,  can  give

                 list  of  cached  copies  to  any system

                 writing the data.

/*

Snooping is if there is a shared bus, so every processor can 
watch it so they know all the accesses to memory that way 
they know what has been changed in caches.  you can sort of 
do this with ethernet too.lots of approaches, we need this if 
we want a consistent file system.  

*/


   "False Sharing"- a block is shared, but the data  in  it

    isn't  shared.   I.e. process A is using subset X of the

    block, and process B is using subset Y of the block, but

    X and Y don't overlap.

        This  problem  can  have a big effect on the perfor-

        mance of a cache consistency scheme.


   "Write Merge" - if there are several writes to a  block,

    while  it  is  in cache and before it is written back to

    disk, can "merge" the writes, so  that  only  cumulative

    update is written back.


 Process Migration

      Want  to take running process and move it to another

      computer.

      Why is this hard?
      
          Need to save and transfer state.

          Need to maintain all connections (to OS,  to

          network,  to  other systems, to I/O devices,

          to file system).

          Connections can be forwarded.

          Connections can be reconnected.

          Obviously doesn't work if  CPU  architecture

          is different.  (Object code won't run.)

          Very  hard if OS is different - they have to

          be  interoperable  -  all  the  same  system

          calls, etc.


/*

What if other processor doesn't have the devices we need?  
Got to have a unified transparent file system, e.g if you 
have a cd then the new machine should also be able to read this cd 
as well.  
you don't have to move all state immediately but you need the state 
when you need it.  
Page faults are much worse here, and tables are weird, bad because 
now you might need to access disk of another computer. You could 
back the whole thing up and move it over and start again. It has to 
be the same architecture can't go from windows to a powerpc chip on 
a mac. Code won't run.  
What if network id is hard coded?  
If you got a running processes it's talking to rest of world, it's 
like redirecting your email, so if you move things need to be 
temporarily forwarding. Ideally, the system realizes this and won't 
need to forward but will go there directly. Process migration sounds 
great but it's hard. 

There is a package called mozics that does some of this in linux.  
Process migration has been done in linux.


Question:  
you do migration for load balancing?

Answer: 
Yeah typically.  it's easier to move a process than a file sometimes.  
or also for reliability.  

*/


  Parallel Programming and Amdahl's Law

             N  processors will never get you an N-times speedup.

                 Part of the computation is  not  parallelizable.

                 (At  least the code that distributes and gathers

                 the computation is not parallelizable.)

                 Amdahl's Law - speedup is limited by  sequential

                 component.

             Parallel programming also requires communication and

             sharing, both of which impede performance.


/*

Digression:
10% of all electricity in santa clara goes to run computers.  
There are service centers for server farms. The power desnity is 
just millions of watts going in, huge.  The problem is aggregated 
heat production, i mean 50k machines is gonna make a lot of heat.  
if you don't dissipate that heat the building is gonna is slag.  
This is why some chips are preferred because they use less power 
but less instructions.  server farms is all about getting heat 
out of the building. it'd be nice if we had server farm migration 
right haha move it to minnesota in the winter right. A company 
that provides computational power and moves around to disispate 
heat, this might work with fiberoptic cables, maybe. you know 
nowadays it might be better to just get a bunch of computers than 
a heater for warmth. excess heat can be reused for electricity.  
fry egg or cook chicken with processors?!
Overclocking processors, you could immerse an entire machine into 
a liquid. That way the heat would dissipate.
Systems have strange bottlenecks, i.e dissipating heat.
parallel computation, you can parallelize 90% but that doesn't 
mean the system runs that much faster.  
Nothing is 100% parallelizable. Parallel processing requires 
communication and impedes performance.  Parallel processing is 
coming because we can't make them so much smaller and it's getting 
hotter. So what we do is to put more processors now.  
If you make what happens in a machine cycle smaller you do less.  
systems with multiple cpus require code to become parallel.  
Parallel programming used to be research but now it is realistic 
if we want greather throughput.


If you want to know more about tcp/ip you should check last 
lecture.  just go to section.

*/


Topic: Protection and Security


/*

About 2 lectures on this stuff or Karl will give half a lecture.
should we take the new security course? if you are interested, 
prof will be talking more about this stuff.  if you just want 
protection and security we could just put a computer in a locked 
room and limit access to it.

*/


 +   The purpose of a protection system is to  prevent  accidental

     or intentional misuse of a system while permitting controlled

     sharing.  It is relatively easy to  provide  complete  isola-

     tion.


 +   Accidents:

      +   E.g. program mistakenly overwrites  the  file  containing

          the  command interpreter.  Nobody else can log in.  Prob-

          lems of this kind are easy to solve  (can  do  things  to

           make the likelihood small).

      +   You accidently destroy a file you'd like to keep.
      
      /* 
      
       you're lying if you never accidentally deleted 
       a file.  Some accidnets you can recover
       some you can't.  
     
      */


 +   Malicious abuse.

     /*

      avoid malicious abuse lots like that, e.g speed dialing ip 
      addresses, script kiddies that try to break into systems 
      unpatched.  These are real problems.
      3 things we are concerned with  
      e.g adjusting your grades
      e.g can't give exam cause prof system crashed.
      e.g someone stole personal id.
     
     */

 +   Speed dialing of IP addresses

 +   Script Kiddies

 +   In class I'll make jokes, but this is really a serious  problem.

         
 +   Three types of effects we are concerned with:

     +   Unauthorized information modification

     +   Unauthorized denial of use.

     +   Unauthorized information release.


 +   The biggest complication in a general purpose remote accessed

      computer system is that the intruder in these definitions may

	be an otherwise legitimate user of the system.


  +   Examples of problems - not solved - these  are  not  computer

          problems:

          +   Fake timesheets for paychecks

          +   Repeat button printer to print extra paychecks

          +   Round off amounts and put into special account.

          +   Make up deposit slips with your account # on them.

          +   Make up checks with your name, but some other  account  #

              on them.  (Paid out of other account).


/*

Some stories: 
You can scam people, like fake time sheets, not a computer program.  
another story a guy who stood by a printer waiting for his check and 
kept hitting repeat, so he got multiple paychecks. Write a routine 
that computes interest, instead of rounding stuff you can always round 
down and whatever is left over you take it.  this is what they did in 
office space.  you get fractions of a penny but do it enough you'll 
be rich. Blank deposit slips to trick machine. not a computer program.  


Question: 
How do we prevent problems like this nowadays?  
Answer: 
Who says we do?

Society functions reasonably because the right thing to do is in your 
advantage. Like in us the economy rewards talent well enough it's 
better to earn an honest living than being a criminal. This is opposite 
in the 3rd world.
So this is the reason why we don't do it, smart people are better off 
doing the right thing rather than the bad thing. The risk/reward ratio 
is obvious not in the criminal's favor.  you can't be too successful 
either though. Just so you know cuba doesn't extradite for certain 
crimes.

*/


  +   Functional Levels of Information Protection

          +   Unprotected System

          +   All or nothing system.  (i.e. sharing or complete  isola-

              tion).

              +   The simplest type of  protection  system  is  just  a

                  user/system   distinction.    Someone  with  "system"

                  privileges can do anything.  A user can do stuff only

                  permitted user level access.

                  +   Obviously, this is unsatisfactory -  we  need  to

                      differentiate  between users, and given more fine

                      grained access.

          +   Controlled sharing.

          +   User programmed sharing controllers - e.g. user  want  to

              put  complex restrictions on use, such as time of day, or

              concurrence of another user.


  +   Design principles for protection mechanisms:

          +   1) Economy of mechanism: keep the design  as  simple  and

              small as possible.  (KISS - keep it simple, stupid.)

          +   2) Fail safe defaults - base access decisions on  permis-

              sion, not exclusion.  If the system fails, the default is

              lack of access.

          +   3) Complete mediation: every access to every object  must

              be checked for authority.

          +   4) Open design:  the design should not  be  secret.   Its

              effectiveness  should not be impaired by knowledge of the

              mechanism.

          +   5) Separation of privilege - where feasible, a protection

              mechanism  that  requires  two  keys to unlock it is more

              robust than one that allows access to  the  presenter  of

              only one.

          +   6) Least privilege - give no more than the  required  ac-

              cess.

          +   7) Least  common  mechanism  -  minimize  the  amount  of

              mechanism  common to more than one user and depended upon

              by all users.  (These represent the information paths.)

          +   8) Psychological acceptability - human interface must  be

              convenient!

/*

Encryption - if people know the mechanism and there is a flaw they can 
discover it quickly. You can concentrate secrecy all on the key, but 
it's not quite the same thing.  people can break it but at least you 
will know about it quick.  Didn't work for the minotaur, theseus myth,
and the labyrinth, it relates cause the minotaur was relying on the 
theseus to get lost.


Seperation of privelege sort of like the two keys and activating a 
missle silo. Least common mechanism stuff that provides security is 
small, fewer moving parts. stuff is used by people, people have to 
put up with it.  if it's too inconvenient, people won't use it.  
I mean if you have a system with a weekly password that isn't easy 
to remember people won't remember how to use it. If you have a lot 
of security features, you don't need all of them so they become 
a pain.

*/


  +   There are three aspects to a protection mechanism:

          +   Authentication:  Confirm that the user is who he says he

              is.

          +   Authorization determination:  must figure  out  what  the

              user  is and isn't allowed to do.  Need a simple database

              for this.

          +   Access enforcement:  must make sure  there  are  no  loo-

              pholes in the system.

          Even the slightest flaw in any of these areas  may  ruin  the

          whole protection mechanism.


    /*

     Authentication but how do you make sure they are who they say 
     they are?  There are 3 Alan Smiths on campus just so you know, 
     other two are in biology though.  
     Authentication is easier cause laptop has built in finger 
     readers, but how reliable are they?   
     Authentication done with password but in the future it's 
     biotmetrics, retinal scans, brain scans, fingerprints.

    */


  +   Authentication is most often done with passwords.  This is  a

          relatively weak form of protection.

          +   A password is a secret piece of information used  to  es-

              tablish the identity of a user.

              +   Passwords can be compromised in a number of ways:

                  +   Can be stolen (you write it down somewhere - wal-

                      let, address book, front of terminal)

                      +   Story of someone who looked in waste  baskett

                          of guy who added new passwords.

                      +   Story of guy (on CTSS) who logged on at  mid-

                          night (when system administrator was on), ex-

                          panded his segment and got copy  of  password

                          segment of system admin.

                  +   Line can be tapped and password copied

                  +   Password can be observed as typed in.

                  +   Password can be guessed (your name, your mother's

                      name, your project, your birthday).

                      +   Password can be  repeatedly  tried.   (Should

                          not be permitted).

                  +   Password file can be broken.


    +   Counter-Actions

          +   Passwords should not be  stored  in  a  directly-readable

              form.  One-way transformations should be used.

              +   Use a one-way function.  Given a copy of the password

                  file,  can't invert the transformation to get back to

                  the  password.   (Unix  does  this.   Password   file

                  (/etc/passwd) is readable.)

                  +   Unix also uses "SALT"  to  extend  the  password.

                      Idea  is that salt (e.g. user ID) is concatenated

                      to password before encryption - this prevents ef-

                      ficient  generation  of  file of encrypted common

                      passwords for comparison

      /*

      In unix/linux takes plain text and turns into encrypted.
      there is no inversion back to plain text.  this is a mapping 
      from a->b, but you can always try all x and y by brute force to 
      find the mapping, but otherwise there really is no way to find 
      it.
      SALT is a nice trick, copy password file and brute force with 
      common guesses.  make a list of common passwords, encrypt, and 
      match them against other ones. 
      Like if your login was moon, you'd do moon sun.  same pass but 
      to different things.  this makes breaking it by brute force less 
      efficient. Take 10k passes encrypt them.  compare with real 
      passes.  
      SALT is a condiment, think of it as a condiment for passes.  
      
      */

              +   Password  testing  should  be  slow,  to   discourage

                  machine based tests.

                  +   Limit the number of tests.

          +   Passwords should be relatively long and  obscure.   Para-

              dox:  short  passwords  are easy to crack, long passwords

              are easily forgotten and usually written down.

              +   Password testing programs (using common names, words,

                  etc.) often can break more than half of the passwords

                  in the system.

              +   Tell the user when he last logged in -  can  tell  if

                  there was an intruder.


	/*

	At MIT the system generated password for you, luckily all passes 
	are pronouncable in english.

	Another useful thing is when the user logged in, that way user 
	can tell if someone hacked them.  passewords aren't exactly very 
	hard to break, experiment wise you can break like half of them.
	There's gonna be a whole course in privacy and protection.

	*/


     +   Note: we must  protect  the  authorizer.   The  program  that

          checks  whether  the  password matches (or encrypts the pass-

          word) must be incorruptible.


          +   (If there is physical security, must be sure  guards  are

              incorruptible.)


        /*

         Need to protect authorizer, in a computer system, you got to 
         make sure no one rewrites the code to check passes.
         Like a guard at the gate.  make sure no one kills he guard or 
         bribes them.  
         In Livermore, you need a badge and everything, the guard is 
         supposed to check badge and you, so some guy decides this 
         doesn't work too well.  A guy came in with a monkey badge and 
         this worked for a while, but he got fired.  
         it's not easy being a whistleblower.

        */


     +   Another form of identification:  badge or key.

          +   Does not have to be kept secret.

          +   Should not be forgeable or copyable.

          +   Can be stolen, but the owner should know if it is.

          +   Pain to carry.

     +   Key paradox:  key must be cheap to make, hard  to  duplicate.

          This means there must be some trick (i.e. secret) that has to

          be protected.


/*

Question:
keycards? 

Answer:
Not sure, but my guess is you can't just manufacture one that works cause 
it's too complex, so realistically what are the chances i can get some 
blanks and then program them.  Or you could buy them. forging things from 
scratch is really hard, but you could do it if you were the KGB, but your 
average crook is doubtful to be in touch with the KGB. There is a problem 
with credit card blanks being stolen, and it isn't too hard to imprint 
them.  

*/