Strom Lee - CS162-DN Guang Li - CS162-DC April 20, 2005 ******************************* OVERVIEW: * Continued from last lecture: $ Protection Countermeasures $ Inference Control $ Confinement Problem $ Viruses * new Topic: Encryption $ basic steps to encryption/decryption $ types of cryptographic systems $ Key distribution ******************************* Protection Countermeasures: The following methods, although not perfect, can be used to reduce frequency and severity of security compromises. 1. Logging/Auto-trailing - Record all important actions and uses of privilege in an indelible file. Serves as an auto-log of events and can be used to catch imposters during their initial attempts and failures. For example, when an imposter attempts to specify an incorrect password and all super-user logins. This is also called an audit trail. Problems - a. Super-user can modify the or erase audit trails. To solve this, we can make the audit trail permanent (either have a hard copy, or record the data on write-once media). b. If the system has been broken, the audit trail itself may be vulnerable. The CS department at UCB stores its audit trail on a file server, which is separate from the user machine. 2. Human interaction at key steps - Examples of this include having two separate missile launch keys. Since the keyholes are farther apart than a human can reach, two people are required to activate the launch. Another example is a human verifying an electronic funds transfer. 3. Minimum privilege principle - Things done on a "need to know" basis. Each piece of the system is only given the minimum amount of information (only what it needs to know to run), for the minimum possible amount of time. For example, the file system can't modify the memory map and the memory manager can't touch the disk allocation tables, reducing accidental or intentional damage to the system. Capabilities are an implementation of this minimum privilege idea. Problems - This type of security (i.e. information containment) is very hard to fool-proof. a. Trojan horses can write characters to a tty, or take page faults and use it as Morse code as a signal to another process. This is similar to the data communication through interrupts we explored in HW #1. 4. Correctness proofs - Go through the code line by line and verify it's doing what the spec says it's supposed to do. This method was introduced by Bob Floyd and proving assertions, through pre-conditions, post-conditions, and process. Problems - The myriad of problems of this method include the fact that you can only do this for small, simple programs (only a few hundred lines of code). People have tried to automate the proving method, but to little success. Programs are just too complex and large for this correctness proofs to be effective and proved in a reasonable amount of time. The technology of artificial intelligence for auto-proving also advances slowly; the AI progress is very fast and then hits a wall (see graph). Also, just because something is working according to spec doesn't mean that the spec is correct. This method also can't deal with a Trojan horse. AI difficulty | | impossible to do | | | | | | | easy to do | |______________/ |________________________ Technological Advancements (DIGRESSION) An example of AI difficulty - Although we have the technology to transliterate from language to language, building an actual speech interpreter is nearly impossible for computers. All sorts of semantic and grammatical issues pop up (e.g. sentence structure, connotation, etc.). Therefore, the easy part of this technology is actually to transliterate from language to language, and the hard (impossible) part is to make the translation meaningful to the person who is trying to understand what the speaker means to say. There's also a problem with speech recognition, which today are at best 60% reliable without individual customization. (END DIGRESSION) 5. Callback - Used to avoid abuse of accounts. E.g., if an IBM employee dials in, the computer disconnects and calls back to a given home number to ensure it's the right person. (This was back in the day with those annoying, beeping dial-up modem thingies...). 6. Consistency or Plausibility check - These checks are used in many credit card companies to prevent fraudulent credit card use. The companies monitor card usage and look for strange patterns in spending. Any anomalous behavior is then recorded and the company verifies the transactions with the card owner. E.g., if you live in Berkeley and only spend a maximum of $100 and the card company finds that your card was used to buy a car in New York. Problems - This security measure can be an inconvenience to the user. For example, you could be using your card in a different spending pattern, and the company would hassle you to verify these payments. Inference controls The goal of inference controls is to allow users to get statistical information (e.g. averages) out of a database, but not get individual data. E.g. the average salary of all people in a certain city, but not the salary of an individual person. Problems - The system is designed to only answer statistical queries but not queries about individuals. However, a person can design a set of queries that could be used to infer individual information. E.g., to find an individual's salary, one queries for: a. The average salary of X. b. The average salary of X-delta, where delta describes only one individual. c. The size of X. The problem is that there is no good solution to this problem. But, we can do the following to reduce the accuracy of inferences. 1. Randomize data - Fudge the real data slightly to introduce a small amount of variation to the data. 2. Permit only queries on predefined groups - e.g. queries for a pre-determined set of zip codes. The Confinement Problem The problem of mutually suspicious customer and service. How do we know who keep both sides private from each other? We want to ensure that the service can only reach information provided by the customer, and that the service is protected by the customer. This idea is a concept of information utility and is currently re-surfacing as server-based software. Problems - 1. The service may not perform as advertised. 2. The service may leak (i.e. transmit confidential data). Possible leak sources: a. If the service has memory, it can collect data. For example, it can write data to a permanent file or it can write to a temporary file which can be read by a spy trying to steal information. b. The service can send a message to a process controlled by the process owner (i.e. the spy). c. The information can be encoded in the bill rendered for service. d. If the file system has interlocks, the service can lock and unlock a file and the spy can watch it like Morse code. As mentioned earlier, we were introduced to this method through HW #1. e. The service can vary the paging rate, which affects performance. An example would be transposing a matrix the wrong way. From the page rate, a spy could gather data through some type of Morse code. Viruses - Really only appear in PCs. PCs transfer around executable files and code, e.g. in email (because everything's executable in Windows) - User executes this code, and bad things happen. a. Virus usually replicates itself elsewhere b. and does something unpleasant to your machine. - General technique is to search for known viruses by looking for their object code. (search in an anti-virus software's list) - Problem 1: is that viruses encrypt themselves. Solution 1: search for decryption code - Problem 2: Viruses may change the decryption code. Solution 2: is to interpretively execute the suspected virus code for some portion of time, to see if the code decrypts itself into something that is recognized as common virus. - There is no good defense against an unknown virus, since the code patterns can't be recognized. Topic: Encryption - Prof. Smith recommends: a) Kahn, "The Codebreakers". It's easy to read up until WWI. b) See also Whitfield Diffie and Martin Hellman, "Privacy and Authentication: An Introduction to Cryptography", Proc. IEEE, 67, 3, March, 1979, pp. 397-427. - Popular approach to security in computer systems: encryption. Store and transmit information in an encoded form. a) Cryptography - the use of transformation of data intended to make the data useless to one's opponents. b) Note that encryption is not new - has been used since times of the Romans: "Caesar Cipher". sent over network Key1 | Key2 V V V clear text -> encrypt ---> cipher text ---> decrypt ---> clear text | V listener - The basic mechanism (see above diagram) 1. Start with text to be protected. Initial readable text is called clear text. 2. Encrypt the clear text so that it doesn't make any sense at all. The nonsense text is called cipher text. The encryption is controlled by a secret password or number; this is called the encryption key. 3. The encrypted text can be stored in a readable file, or transmitted over unprotected channels. 4. To make sense of the cipher text, it must be decrypted back into clear text. This is done with some other algorithm that uses another secret password or number, called the decryption key. - All of this only works under three conditions: 1. The encryption function cannot easily be inverted (cannot get back to clear text unless you know the decryption key). 2. The encryption and decryption must be done in some safe place so the clear text can't be stolen. 3. The keys must be protected. In most systems, can compute one key from the other (usually the encryption and decryption keys are identical), so can't afford to let either key leak out. - Types of Crytographic Systems: 1. (Simple) Substitution: There is a function f(x) which maps each letter of the plaintext (or group of letters) into f(x). f(x) must be 1-1 or one to many. If f(x)=x+1, then called a Caesar Cipher. ** How to break: by using tables of frequencies of letters, doubles, triples, etc. -> Mapping "to many" disguises frequency. 2. Transposition: Permute (or transpose) the input in blocks to obtain the output. ** How to break: Look for permutations that rejoin commonly used letter pairs, such as "th". 3. Polyalphabetic Ciphers - substition cipher, where f(i,x) is a function of i, which is the sequence number of the letter in the text. Typically periodic in i. Can get long periods by using two functions with relatively prime periods. ** How to break: a) Solved in two steps. First look for repeated strings, and count the number of letters between them. Least common denominator of distance between strings is the period. (Or can look at frequencies of letters K apart, until they look ok, then K is period of cipher.) Then solve each of N ciphers separately, using frequency methods. b) Old fashioned coding machines (e.g. Hagelin machines) worked as polyalphabetic cipher - had rotating wheels with relatively prime number of cogs. Code was product of path through wheels. 4. Running Key Cipher - use key as long as message - e.g. text of book. (but not random) ** How to break: use probably word; substitute it everywhere (i.e. XOR it with the cipher text) and see if a recognizable word pops out. If so, work backward and forward by context. Or, use frequency methods - but frequencies are now products of key and message frequencies, so quite hard. 5. Codes - take linguistic units of input (e.g. words) and use a code book (large table) to map them into output - e.g. letter groups. (Can also encode phrases.) ** Hard to solve. Try frequency counts. Also known plaintext method. - Typical approaches to cryptography: frequency counts, probable words, known plaintext. (e.g. we have the encrypted and plain text -> deduce mapping) - Other approaches: 1. traffic analysis (can see that there is some message traffic - that is a problem) 2. Playback - listener plays back legit messages. Confuses people. 3. Operator error and personnel security -> give him 1 million $, and he might tell you the key. - In most systems, key distribution is the biggest weakness. - Any encryption scheme where the effort to break it exceeds the reward can be considered secure (you don't want to spend lots of effort breaking the encrypted message that contains a person's shopping list, thus the encrypted shopping list is considered secure) - The only perfect method of encryption is to use a random key as long as the message. All other systems can be broken, given enough of a message. - Error control is a major problem - if you drop bits, have big trouble recovering message. - Problem: how to establish secure channel (i.e. distribute keys) in the first place? - Method for Key Distribution: * Let KS be the key server. * Use key KX to communicate between user X and key server KS. * Users A and B. * Let "**" mean encryption. * Let ID be the message ID. * Let KAB be the new key that we want to safely distribute. ~ Step1: A to KS [A asks server for key to communicate with B] {A,(ID,B)**KA} --> this whole thing is encrypted with A's key : KA, that the server knows about, so the server can recover the message. ~ Step2: KS to A: [gives the key to A, with a unique ID, so that the message is identifiable.) {(ID,KAB,(KAB,A)**KB)**KA} -> the inner part (ID,KAB,(KAB,A)**KB) is encrypted by A's Key (KA), so A can recover it. It can only recover the unique ID and the new key KAB. But it can't recover ((KAB,A)**KB) because that is intended for B to recover with B's key KB. ~ Step3: A to B: [send key to B, so only B can read it.] {(KAB,A)**KB} -> When B receives this, B can recover the distributed key KAB by decrypting the message with its own key KB. ~ Finally: both A and B now has safely obtained the new Key KAB. KS ^ / / 1. {A,(ID,B)**KA} / / / / 2.{(ID,KAB,(KAB,A)**KB)**KA} / / / V A ---------------> B 3. {(KAB,A)**KB}