Lecture Notes for April 20, 2005
***************************************************************
ANOUNCEMENTS:
* Professor Smith still has the midterms that weren’t picked up.
* Karl will give a part of the lecture on security next week.
* HKN passed the evaluation forms for CS162
****************************************************************

Counter measures for security problems: 
       	+there is no perfect solution for the problem of security.
	Here are few possibilities that you can do to reduce the  
	vulnerability.

	LOGGING:
	 +All the commands get logged in. what happened,
	who were the users, what was the account. All the important 
	actions and uses of the privileges get recorded in inventible file. 
	
	+You should put these files in a safe place such as a hardcopy from
	a printer or stored in a network where users have access. This type
	of logging can be used to catch imposters during their initial
	 attempts and failures. For example: all the attempts to specify an 
	incorrect login or password can be stored. 

	AUDIT TRAIL:
	 +It basically is the same thing as logging. Audit Trail is a chronological
	 record of system resource usage. This includes user login, 	
	file access, other various activities, and whether any actual or 
	attempted security violations occurred, legitimate and unauthorized. 
		+Just like logging audit trail can be edited or removed by the
		super user. Therefore, store a hardcopy like a printout, write
		to a once media. 
		+In UC Berkeley, audit trail store is a file server
	 	which is separate from user machines. If the system has been 
		 broken, the audit trail itself may be vulnerable.
       
	+Even better is to get humans involved at key steps. For example: Two people
	are required in opening a lock or safe. This is what is done in Electronic
	Fund Transfers.

PRINCIPLE OF MINIMUM PRIVILEDGE:
	 Also known as "need-to-know" principle
	of minimum privilege states that give the users only the access they need to
	perform the necessary task. Give them the access for the minimum amount 
	of time.This helps a lot in minimizing accidental or intentional errors. For example:
	the file system should not be able to access memory map, the memory manager cannot	
	access the disk allocation tables. Capabilities are an implementation of this idea
	Capabilities indicate which objects may be accessed, and in what ways, by each user.
	 a list of <object, privilege> pairs is stored with each user, called a capability list.
	 The user process typically cannot access its capability list directly. The OS manages
	 the list, which makes it difficult for a process to forge a capability.
	In access-list systems, the default is usually for everyone to be able to access an 
	object. In capability based systems, the default is for no one to be able to access 
	an object unless they have been given capability. There is no way of even 
	naming an 	object without a capability. Capabilities have sometimes been used in
	 systems that need to be very secure. However, capabilities can make it 
	difficult to share information: nobody can get access to your stuff unless 
	you explicitly give it to them. Capabilities are difficult to revoke. Example 
	of a simple capability-based protection scheme: file descriptors.
	
	CONFINEMENT PROBLEM: 
	Is the problem of assuring that a borrowed program does not steal for 
	its author information	 that it processes for a borrower. An approach to 
	proving that an operating system	enforces confinement, by preventing
	 borrowed programs from writing information in storage
	 in violation of a formally stated security policy, is presented. 
	It is very hard to provide fool-proof information containment: example,
	 a Trojan 	horse could write characters to a tty, or to take page faults, in
	 Morse code, as a signal to another process. 

	CORRECTNESS PROOF: a mathematical proof of consistency between 
	a specification and its implementation. These are very hard to do.  Even so, 
	this only proves that the system works according to specification.  
	It doesn't mean that the specification is necessarily right, and
	it doesn't deal with Trojan Horses. They work for proving small 
	algorithm. Not very useful.

	CALLBACK USED TO AVOID ABUSE OF ACCOUNT:
	The basic idea is that you dial to the machine and tell it who you are and
	 then machine calls you back. If you provide the wrong number to dial
	back, then it can be determined that you are 	not the right person. At IBM 
	they used to use this feature, when an employee calls in the computer would 
	disconnect and call back. You could only log from a given home number.  
	This requires another extension of network though.
 
	
CONSISTENCY OR PLAUSIBILITY CHECK:
	Application system, for example credit card companies do plausibility or 
	plausibility checks. These systems look for suspicious activities. If 
	anything unusual is seen, the companies confirm with the customer 
	about the activity. For example if user all of a sudden spends $10,000 
	where as his usual purchases are under $100. The user lives in United States,
	 and there is an account activity in Hong Kong.

INFERENCE CONTROLS:
	+ You have a statistical database. For example US Census collects data
	 about people. Then the researchers run queries on this database. e.g. , 
	what is the average salary of the person with age 25? 
	+But you do not want to provide information about individuals. 
	These databases are designed to not to provide individual data. 
	+You can still design queries in such a way that  will give you the 
	information about individuals. 
	+.  E.g. (a) average salary of all X.
	  (b) average salary of X-delta, where delta describes only one individual.
	 (c) size of X.
	  These three queries permit us to deduce delta salary.
	This is a problem. How should you prevent systems from giving away 
	individual's information while allowing statistical information.
	 
	+There is no good solution for this problem
		you can do few things :
		
		+Randomize data (slightly) : if you start randomizing numbers, 
		you wont be able to give them good precision. 
	
		+limit the queries: You can limit the class of queries you can 
		make.  Just give them a set of predefined queries.
		+You can only have certain groups aggregations.


THE CONFINEMENT PROBLEM:
	You want to make sure that the time sharing system, would not 
	pass your data to unauthorized user.
	+ Problem of  mutually  suspicious  customer  and  service.
	+ we want to make sure that service can only access the data that is 
	  provided by the user and the service is protected from the user as well
 	 +   Idea is  concept  of  information   utility.    Idea currently 
	resurfacing as server based software.
	+ TWO PROBLEMS:
		+ a program might not behave exactly as it is intended to.
		 It might steal the information like transmit the confidential data.
        
                +   LIST OF POSSIBLE LEAKS:
	There are still ways to leak information.  
		+if the program have access to the memory, it can collect the 
		unauthorised data as well.
		+it can write that data to a file, or on remote server.
		+ the program can even write the data to a temporary file. This 
		temporary file can later be read by a spy program.
		+ You can signal the other process controlled by its owner,
		if the file systems have interlocks, the service can try to lock 
		unlock the file, and the spy program can watch if the file is locked. 
		Whenever he finds the file is unlocks, spy can access it.
		+You can encode information in paging rates. The service can 
		intentionally vary the 	paging rate and signal the spy program. 
		High rate could mean 1 and low rate could mean 0

+  VIRUSES:
	+ Most of the virus problems appear in PCs which mostly run on Windows. 
	In Windows everything is executable. The first thing a program does with 
	data is to try to execute it. PC transfer the executable files and code which 
	programs execute. For example, a lot of the machines get infected by email viruses.
	In Unix, code is code and data is data. Nothing gets executed unless you 
	execute it explicitly.
	
	+Once the virus is in your machine it will replicate itself in different
	parts of the machine and does unpleasant things to the machine.
         
         +   The General Idea behind searching for viruses is that you look for their 
	object code.
	+The anti-viruses have a list of commonly know viruses object code.
	+ when you check a file for the viruses, the software compares the binary 
	image of the file segments with the known images of viruses.

        +   Some viruses are smart and they would encrypt themselves, so the 
	anti-virus cannot compare them.
        +   The solution is that you decrypt the code before you try to compare it.

	+   Some viruses even change the decryption code.

        +   The solution then is to execute the suspected virus code in isolation 
	for a small amount of time, and see if the code decrypts itself into 
	something that is recognized as a common virus.

        +   There is no good defense against the viruses. 
            	if the virus has a complete new code pattern that the anti virus software 
	doesn’t know about  then the anti-virus wont be able to catch it as virus.
****************************************************************************************
ENCRYPTION


	+Recommended books by Prof. Smith on encryption ,
	"Codebreakers" give the history of cryptography all the way to old days, 
	"Privacy and Authentication" by Whitfield Diffie and Martin Hellman. 
	 "An Introduction to  Cryptography",  Proc.  IEEE,  67,  3,  
	March, 1979, pp. 397-427.

     	+   Popular approach to security in computer systems: Encryption
	+  Definition: You start with a clear text. You encrypt it with a key. 
	This become cipher text. The ideas is that the people shouldn’t be able to
	read this cipher text. The text gets to the destination. The intended users 
	decrypts it with the decryption key. This becomes plain text again and can use it.
 
	+You only send the data in encoded form. Cryptography makes the data
	useless to one's opponents. The idea of encryption is not new, it has 
	been used since the times of Romans - "Caesar Cipher".


THE BASIC MECHANISM
	+Start with the initially readable text, called clear text. Encode it to 
	make it cipher text
	+ A listener can see the cipher text but wont make any sense to him.
	+ The encryption is controlled by the secret key.
	+ Decode the cipher with key into clear text.
        +  The encrypted text can be stored in a readable  file,  or  transmitted over 
	unprotected channels.

DIAGRAM
                     Key1                                          Key2
		      V				                    V
             	      V				                    V
		      V				                    V
clear text ->>> encrypt with Key1>>> becomes cipher text >>>decrypt it with Key2>>>clear text
				         V
                   		         V
				         V
				         V
			             listener
        

ALL ENCRYPTIONS WORK UNDER THREE CONTIONS:

	+ The encryption function shouldn’t easily invertible. You should not able transform
	 cipher text to clear text without the decryption key

	+   The encryption and decryption must be done in  some  safe
             	place so the clear text can't be stolen.

         	+   The keys must be protected.  In most system, encryption and decryption keys
	are the same. So, you cannot afford to leak either of them. 


TYPE OF CRYPTOGRAPIC SYSTEMS:

        + Substitution: There is  a  function  f(x)  which maps  each  letter of the 
	plaintext (or group of letters) into f(x).   f(x)  must  be  1-1  or  one  to 
	 many.   If f(x)=x+1, then its called a Caesar Cipher.
	
        + Example :   the quick brown fox jumps over the lazy dog>>> 
			uif rvjdl cspxo gpy kvnqt pwfs uif mbaz eph 

	+  This type of encryption can be solved by using tables  of  frequencies  
	of  letters, doubles, triples, etc.
	+You use a frequency of table, for the very common letters such as e. 
		  This works for one to one map.

        + If the f(x) is one to many then the frequency tables get messy and are not helpful
	   
        +   Transposition: Permute (or transpose) the input in blocks to obtain the output.

	|----|----|-----|-----|----|
	|T   | H  | E   |     | Q  |
	|----|----|-----|-----|----|
	|U   | I  | C   | K   |    |
	|----|----|-----|-----|----|
	|B   | R  | O   | W   |  N |
	|----|----|-----|-----|----|
	|F   | O  |  X  |     |    |
	|----|----|-----|-----|----|

	The clear text is read horizontally (THE QUICK BROWN FOX)
	and the vertical text can be treated as cipher text (TUBF HIRO ECOX KW Q N) 
	
	+You can break it by , trying various common permutation to find the 
	Polyalphabetic cipher:

        + Look  for  permutations  that  rejoin  commonly  used letter pairs, such as "th".

	+Monoalphabetic Cipher is not very secure. Frequency 
	analysis makes it possible to break it. Therefore a better solution is 
	Polyalphabetic Ciphers.

         	+   Polyalphabetic Ciphers - substitution cipher, where  f(i,x)
             	is  a  function of i, which is the sequence number of the
             	letter in the text.  Typically periodic in  i.   Can  get
             	long periods by using two functions with relatively prime
             	periods.

	+ Polyalphabetic Ciphers can be broken in two steps:
	If we can find length of the block or the length of the key
	we can apply the frequency analysis, to letter every block apart.
               	
	+We look for the repeated letters and count the numbers between them.
	+Least common denominator of  distance  between strings  is the 
	period.  
	+ Thus we can look at frequencies of letters K apart, until they 
	look  ok,  then  K  isperiod  of  cipher.  Then  solve  each 
	 of N ciphers separately, using frequency methods.

             +   Old fashioned coding machines (e.g. Hagelin machines)
                 worked as polyalphabetic cipher - had rotating wheels
                 with relatively prime number of cogs.  Code was  pro-
                 duct of path through wheels.


RUNNING KEY CIPHER-  
	+the running key cipher is a type of polyalphabetic substitution 
	cipher in which a text, typically from a book, is used to provide 
	a very long key stream. Usually, the book to be used would be
	 agreed ahead of time, while the passage to use would be chosen 
	randomly for each message and secretly indicated somewhere in the message
	
        +   Solve: use probably word;  substitute  it  everywhere 
	  (i.e.  XOR  it  with  the  cipher  text) and see if a recognizable word
	 pops out.  If so, work backward and forward  by context.  Or, use 
	frequency methods - but frequencies are now products of key and message  
	frequencies, so quite hard.

CODES
	+ In Codes you are using linguistic units of input. So you have code
	 books for the words. You can map words to output using the book.
	Its very hard to break the codes without code book. Can also encode
	phrases
	
             
        +   Typical approaches to break such encryption are : 
	frequency counts, probable  words,  known plaintext.

         + OTHER APPROACHES TO BREAK CODES:

         +   Traffic Analysis:  Traffic analysis is the process of intercepting and 
	examining messages in order to deduce information from patterns in 
	communication. It can be performed even when the messages are 
	encrypted and cannot be decrypted. In general, the greater the number
	 of messages observed, or even intercepted and stored, the more that 
	can be inferred from the traffic.

         +   Playback: Listeners play back the same encrypted 
  	messages, this confuses the people who are exchanging messages.

         +   Sometimes Operator errors can help decode the encryption
	  If the Operator encrypts the message with wrong key,
       	  he might encrypt the same message with a different key again.
	  Two versions of encryption can help to decrypt the message.

         + In most of the encryption system, the distribution of keys	
	   safely is the biggest problem. 
 	
        + In any encryption, if the effort to break the encryption  is more
	  than the actual value of the information. Then the encryption
	  is considered successful.
        + Example: For high school gossip, simple monoalphabetic cipher is good enough.
  	
        + The most secure way of encryption is to use random key as long as the message.
        + Hardest to maintain though.
        + All the other systems, can be broken given enough messages.

        + Error Control is a major problems too. If the bits get dropped or corrupted
	  it becomes very hard to recover the message.
    

     + PROBLEM:
	how to distribute keys secure in the first place ?


     + One method for key distribution.

         +   Let KS be the key server.   Use  key  KX  to  communicate
             between  user  X  and key server KS.  Users A and B.  Let
             "**" mean encryption.  Let ID be the message ID.  Let KAB
             be the new key.

         +   1. A to KS: {A,(ID,B)**KA} [A asks server for key to com-
             municate with B]

         +   2. KS to A: {(ID,KAB,(KAB,A)**KB)**KA} [gives the key  to
             A,  with  a  unique  ID, so that the message is identifiable.)

         +   3. A to B: {(KAB,A)**KB} [send key to B, so  only  B  can
             read it.]