Strom Lee - CS162-DN
Guang Li - CS162-DC

April 20, 2005


*******************************
OVERVIEW:

* Continued from last lecture:
   $ Protection Countermeasures
   $ Inference Control
   $ Confinement Problem
   $ Viruses

* new Topic: Encryption
   $ basic steps to encryption/decryption
   $ types of cryptographic systems
   $ Key distribution

*******************************


Protection Countermeasures: 
The following methods, although not perfect, can be used to reduce frequency and
severity of security compromises.

1.  Logging/Auto-trailing - 
	Record all important actions and uses of privilege in an indelible file.  
	Serves as an auto-log of events and can be used to catch imposters during 
	their initial attempts and failures.  For example, when an imposter attempts
	to specify an incorrect password and all super-user logins.  This is also 
	called an audit trail.

Problems -
	a.  Super-user can modify the or erase audit trails.
		To solve this, we can make the audit trail permanent (either have a
		hard copy, or record the data on write-once media).
	b.  If the system has been broken, the audit trail itself may be vulnerable.
		The CS department at UCB stores its audit trail on a file server, 
		which is separate from the user machine.

2.  Human interaction at key steps -
	Examples of this include having two separate missile launch keys.  Since
	the keyholes are farther apart than a human can reach, two people are 
	required to activate the launch.  Another example is a human verifying an 
	electronic funds transfer.

3.  Minimum privilege principle -
	Things done on a "need to know" basis.  Each piece of the system is only
	given the minimum amount of information (only what it needs to know to run),
	for the minimum possible amount of time.  For example, the file system can't
	modify the memory map and the memory manager can't touch the disk 
	allocation tables, reducing accidental or intentional damage to the system.
	Capabilities are an implementation of this minimum privilege idea.

Problems -
	This type of security (i.e. information containment) is very hard to fool-proof.
	a.  Trojan horses can write characters to a tty, or take page faults and use it
	     as Morse code as a signal to another process.  This is similar to the data 
	     communication through interrupts we explored in HW #1. 

4.  Correctness proofs -
	Go through the code line by line and verify it's doing what the spec says it's 
	supposed to do.  This method was introduced by Bob Floyd and proving 
	assertions, through pre-conditions, post-conditions, and process. 

Problems -
	The myriad of problems of this method include the fact that you can only do 
	this for small, simple programs (only a few hundred lines of code).  People
	have tried to automate the proving method, but to little success.  Programs
	are just too complex and large for this correctness proofs to be effective 
	and proved in a reasonable amount of time.  The technology of artificial 
	intelligence for auto-proving also advances slowly; the AI progress is very 
	fast and then hits a wall (see graph).  Also, just because something is working 
	according to spec doesn't mean that the spec is correct.  This method also 
	can't deal with a Trojan horse.

   AI difficulty
	|	        | impossible to do
	|	        |
	|	        |
	|	        |
	| easy to do	|	
	|______________/
	|________________________  Technological Advancements

(DIGRESSION)
An example of AI difficulty -
	Although we have the technology to transliterate from language to
	language, building an actual speech interpreter is nearly impossible for 
	computers.  All sorts of semantic and grammatical issues pop up 
	(e.g. sentence structure, connotation, etc.).  Therefore, the easy part of 
	this technology is actually to transliterate from language to language, and 
	the hard (impossible) part is to make the translation meaningful to the 
	person who is trying to understand what the speaker means to say.  
	There's also a problem with speech recognition, which today are at best 
	60% reliable without individual customization.
(END DIGRESSION)

5.  Callback -
	Used to avoid abuse of accounts.  E.g., if an IBM employee dials in, the
	computer disconnects and calls back to a given home number to ensure
	it's the right person.  (This was back in the day with those annoying, 
	beeping dial-up modem thingies...).

6.  Consistency or Plausibility check -
	These checks are used in many credit card companies to prevent fraudulent
	credit card use.  The companies monitor card usage and look for strange
	patterns in spending.  Any anomalous behavior is then recorded and the
	company verifies the transactions with the card owner.  E.g., if you live in
	Berkeley and only spend a maximum of $100 and the card company finds
	that your card was used to buy a car in New York.

Problems -
	This security measure can be an inconvenience to the user.  For example, 
	you could be using your card in a different spending pattern, and the 
	company would hassle you to verify these payments.

Inference controls
	The goal of inference controls is to allow users to get statistical information
	(e.g. averages) out of a database, but not get individual data.  E.g. the
	average salary of all people in a certain city, but not the salary of an 
	individual person.

Problems - The system is designed to only answer statistical queries but not
	queries about individuals.  However, a person can design a set of queries
	that could be used to infer individual information.  E.g., to find an
	individual's salary, one queries for:
	a.  The average salary of X.
	b.  The average salary of X-delta, where delta describes only one individual.
	c.  The size of X.
The problem is that there is no good solution to this problem.  But, we can do the
	following to reduce the accuracy of inferences.
	1.  Randomize data - Fudge the real data slightly to introduce a small 
	     amount of variation to the data.
	2.  Permit only queries on predefined groups - e.g. queries for a
	     pre-determined set of zip codes.

The Confinement Problem
	The problem of mutually suspicious customer and service.  How do we
	know who keep both sides private from each other?  We want to ensure
	that the service can only reach information provided by the customer, and
	that the service is protected by the customer.  This idea is a concept of
	information utility and is currently re-surfacing as server-based software.
Problems -
	1.  The service may not perform as advertised.
	2.  The service may leak (i.e. transmit confidential data).
		Possible leak sources:
		a.  If the service has memory, it can collect data.  For example, it
		     can write data to a permanent file or it can write to a temporary
		     file which can be read by a spy trying to steal information.
		b.  The service can send a message to a process controlled by the
		     process owner (i.e. the spy).
		c.  The information can be encoded in the bill rendered for service.
		d.  If the file system has interlocks, the service can lock and unlock
		     a file and the spy can watch it like Morse code.  As mentioned
		     earlier, we were introduced to this method through HW #1.
		e.  The service can vary the paging rate, which affects performance.
		     An example would be transposing a matrix the wrong way.
		     From the page rate, a spy could gather data through some type
		     of Morse code.


Viruses
	- Really only appear in PCs.  PCs transfer  around  executable files 
          and code, e.g. in email (because everything's executable in Windows)

        - User executes this code, and bad things happen.
             a. Virus usually replicates itself elsewhere
             b. and does something unpleasant to your machine.

        - General technique is to search for known viruses by looking for 
          their object code. (search in an anti-virus software's list)

        - Problem 1: is that viruses encrypt themselves.
            Solution 1: search for decryption code

        - Problem 2: Viruses may change the decryption code.
            Solution 2: is to interpretively execute the suspected virus code 
                        for some portion of time, to see if the code decrypts 
                        itself into something that is recognized as common virus.

        - There is no good defense against an unknown virus, since the code 
          patterns can't be recognized.


Topic: Encryption

        - Prof. Smith recommends:
          a) Kahn, "The Codebreakers". It's easy to read up until WWI.
          b) See also Whitfield Diffie and Martin Hellman, "Privacy and 
             Authentication: An Introduction to Cryptography", Proc. IEEE, 67, 3, 
             March, 1979, pp. 397-427.

        - Popular approach to security in computer systems: encryption. Store 
          and transmit information in an encoded form.
          a) Cryptography - the use of transformation of data intended to make 
             the data useless to one's opponents.
          b) Note that encryption is not new -  has  been  used  since times of 
             the Romans: "Caesar Cipher".

                             sent over network
                     Key1           |                 Key2
                      V             V                  V
   clear text ->  encrypt --->  cipher text --->   decrypt ---> clear text
                                    |
                                    V
                                listener

        - The basic mechanism (see above diagram)
          1. Start with text to be protected.  Initial  readable text is called 
             clear text.
          2. Encrypt the clear text so that it doesn't make any sense at all. 
             The nonsense text is called cipher text. The encryption is controlled
             by a secret password or number; this is called the encryption key.
          3. The encrypted text can be stored in a readable file, or transmitted 
             over unprotected channels.
          4. To make sense of the cipher text, it must be decrypted back into clear
             text. This is done with some other algorithm that uses another secret 
             password or number, called the decryption key.

        - All of this only works under three conditions:
          1. The encryption function cannot easily be inverted (cannot get back to 
             clear text unless you know the decryption key).
          2. The encryption and decryption must be done in some safe place so the
             clear text can't be stolen.
          3. The keys must be protected. In most systems, can compute one key from 
             the other (usually the encryption and decryption keys are identical),
             so can't afford to let either key leak out.

        - Types of Crytographic Systems:
          1. (Simple) Substitution: There is a function f(x) which maps each letter 
             of the plaintext (or group of letters) into f(x). f(x) must be 1-1 or one
             to many. If f(x)=x+1, then called a Caesar Cipher.
             ** How to break: by using tables of frequencies of letters, doubles, 
                triples, etc.
                 -> Mapping "to many" disguises frequency.
          2. Transposition: Permute (or transpose) the input in blocks to obtain the output.
             ** How to break: Look  for  permutations  that  rejoin  commonly  used
                letter pairs, such as "th".
          3. Polyalphabetic Ciphers - substition cipher, where f(i,x) is a function of i, 
             which is the sequence number of the letter in the text. Typically periodic 
             in i. Can get long periods by using two functions with relatively prime
             periods.
             ** How to break:
              a) Solved in two steps. First look for repeated strings, and count the number 
                 of letters between them. Least common denominator of distance between
                 strings is the period. (Or can look at frequencies of letters K apart, until
                 they look ok, then K is period of cipher.) Then solve each of N ciphers
                 separately, using frequency methods.
              b) Old fashioned coding machines (e.g. Hagelin machines) worked as polyalphabetic
                 cipher - had rotating wheels with relatively prime number of cogs.  Code was 
                 product of path through wheels.

          4. Running Key Cipher - use key as long as message - e.g. text of book. (but not random)
             ** How to break: use probably word;  substitute it everywhere (i.e.  XOR  it  with
                the  cipher  text) and see if a recognizable word pops out. If so, work backward
                and forward  by context. Or, use frequency methods - but frequencies are now
                products of key and message  frequencies, so quite hard.
          5. Codes - take linguistic units of input (e.g. words) and use a code book (large table)
             to map them into output - e.g. letter groups. (Can also encode phrases.)
             ** Hard to solve.  Try frequency counts. Also known plaintext method.

        - Typical approaches to cryptography: frequency counts, probable words, known plaintext.
          (e.g. we have the encrypted and plain text -> deduce mapping)

        - Other approaches:
          1. traffic analysis (can see that there is some message traffic - that is a problem)
          2. Playback - listener plays back legit messages. Confuses people.
          3. Operator error and personnel security -> give him 1 million $, and he might tell
             you the key.

        - In most systems, key distribution is the biggest weakness.

        - Any encryption scheme where the effort to break it exceeds the reward can be 
          considered secure (you don't want to spend lots of effort breaking the encrypted
          message that contains a person's shopping list, thus the encrypted shopping list
          is considered secure)

        - The only perfect method of encryption is to use a random key as long as the message.
          All other systems can be broken, given enough of a message.

        - Error control is a major problem - if you drop bits, have big trouble recovering message.

        - Problem: how to establish secure channel (i.e. distribute keys) in the first place?

        - Method for Key Distribution:
          * Let KS be the key server.
          * Use key KX to communicate between user X and key server KS.
          * Users A and B.
          * Let "**" mean encryption. 
          * Let ID be the message ID.
          *  Let KAB be the new key that we want to safely distribute.
          
          ~ Step1: A to KS [A asks server for key to communicate with B]
                   {A,(ID,B)**KA} --> this whole thing is encrypted with A's key : KA, that
                   the server knows about, so the server can recover the message.
          ~ Step2: KS to A: [gives the key to A, with a unique ID, so that the message is 
                   identifiable.)
                   {(ID,KAB,(KAB,A)**KB)**KA}
                   -> the inner part (ID,KAB,(KAB,A)**KB) is encrypted by A's Key (KA), so
                      A can recover it. It can only recover the unique ID and the new key
                      KAB. But it can't recover ((KAB,A)**KB) because that is intended
                      for B to recover with B's key KB.
          ~ Step3: A to B: [send key to B, so only B can read it.]
                   {(KAB,A)**KB} -> When B receives this, B can recover the distributed key
                   KAB by decrypting the message with its own key KB.
          ~ Finally: both A and B now has safely obtained the new Key KAB.


                                                 KS
                                              ^
                                             /  /
                          1. {A,(ID,B)**KA} /  /
                                           /  / 2.{(ID,KAB,(KAB,A)**KB)**KA}
                                          /  / 
                                         /  V
                                          A ---------------> B
                                            3. {(KAB,A)**KB}