4/20/05 CS162 Lecture Notes Chiayu Peng Topic: Protection and Security (Continued) + Countermeasures: 1. Logging (audit trail): Record of all important actions and uses of privilege in an indelible file. -It catches imposters during their initial attempts and failures. (E.g. record all attempts to specify an incorrect password, all super-user logins) -Make sure it's not erasable. ex 1) A superuser can modify or erase audit trails. -Make audit trial unmodifyable - e.g. hard copy, write once media. ex 2) If the system has been broken, the audit trail itself may be vulnerable. -In CS at UCB, audit trail is on file server, separate from user machine. 2. Humans: Get humans involved at key steps/authorization. ex) EFT uses it. 3. Principle of minimum privilege (``need-to-know'' principle): each piece of the system has access to the minimum amount of information, for the minimum possible amount of time. -It reduces chances of accidental or intentional damage. -Capabilities are an implementation of this idea. ex) File system can't touch memory map, memory manager can't touch disk allocation tables. -Hard to provide complete information containment. ex 1) A trojan horse could write characters to a tty, or take page faults. ex 2) In Morse code, as a signal to another process. 4. Correctness proofs (40 years old): insert assertions between statements. -It's very hard to do. -It works for proving small algorithms, i.e. of hundresd of lines, and used in AI. -If the specifications are incorrect, it does not prove anything. -It doesn't deal with Trojan Horses. 5. Callbacks: used to avoid abuse of accounts. ex) At IBM, if an employee dials in, the computer disconnects and calls back. One can only log in from a given home number. -Need extension to network. 6. Consistency/plausibility checks: used in application systems. ex) Credit card companies. E.g. is this user spending $10,000 when his largest previous purchase was $100? Is this user spending in Hong Kong on the same day that a transaction was recorded in NY? + Inference Controls + Goal: to be able to get statistical information (e.g. averages) out of a database, but not get individual data. - The system can be designed to answer only such statistical queries. ex) The average salary of all people living in zip 94720. + Problem: How to secure information? ex) One can design sets of queries that will generate individual info. (a) average salary of all X. (b) average salary of X-delta, where delta describes only one individual. (c) size of X. -> Together they permit us to deduce delta's salary. + No good solution to this problem. + Can do some things: -Offer limited information (wrong answer approach): slightly randomize data, i.e. introduce small errors. -Restrict access by permitting only queries on predefined groups, e.g. zip codes. + The Confinement Problem (in Giant Time-Sharing Systems) + Problem of mutually suspicious customer and service. - Goal: Want to insure that the service can only reach information provided by customer, and that the service is protected from the customer. - Sol: Can use the concept of information utility, which currently resurfaces as server based software. + Two problems remain: 1. service may not perform as advertised. ex) income tax prep 2. leaking data, i.e. transmit confidential data. -Possible data leaks: + If the service has memory, it can collect data by writing it into a file. -a permanent file -a temporary file which can be read by the spy + The service can send a message to a process controlled by its owner. + The information can be encoded in the bill rendered for service. + If the file system has interlocks, the service can lock and unlock a file, and the spy can watch to see if the file is locked. ex)use Morse code. + The service can vary the paging rate (which affects performance). + Viruses (A Windows Problem) + In Windows, everything is executable. To the user's convenience, software automatically picks up a virus program and runs it. + User executes this code, and bad things happen. -Virus usually replicates itself elsewhere -It does something unpleasant to your machine. + General technique: search for known viruses by looking for their object code. + Anti-virus software has a database of all the known viruses. It compares the image of the known virus with the file, possibly by hashing. + Problem 1: Viruses encrypt themselves. + Solution: Search for decryption code + Problem 2: Viruses may change the decryption code. + Solution: Interpretively execute the suspected virus code for some portion of time, to see if the code decrypts itself into something that is recognized as common virus. + There is no good defense against an unknown virus, since the code patterns can't be recognized. + Windows was probably written before viruses were created. + Windows is not a secure system. + Some have proposed to remove the functionality of making everything executable. Topic: Encryption + I recommend Kahn, "The Codebreakers". -fun to read up till WWI. + See also Whitfield Diffie and Martin Hellman, "Privacy and Authentication: An Introduction to Cryptography", Proc. IEEE, 67, 3, March, 1979, pp. 397-427. + Popular approach to security in computer systems: encryption. Store and transmit information in an encoded form. + Cryptography - the use of transformation of data intended to make the data useless to one's opponents. + Note that encryption is not new - has been used since times of the Romans - ``Caesar Cipher''. Key1 Key2 V V clear text -> encrypt ---> cipher text ---> decrypt ---> clear text V listener + The basic mechanism: + Start with text to be protected. Initial readable text is called clear text. + Encrypt the clear text so that it doesn't make any sense at all. The nonsense text is called cipher text. The encryption is controlled by a secret password or number; this is called the encryption key. + The encrypted text can be stored in a readable file, or transmitted over unprotected channels. + To make sense of the cipher text, it must be decrypted back into clear text. This is done with some other algorithm that uses another secret password or number, called the decryption key. + All of this only works under three conditions: 1. The encryption function cannot be easily inverted (cannot get back to clea text unless you know the decryption key). 2. The encryption and decryption must be done in some safe place so the clear text can't be stolen. 3. The keys must be protected. In most systems, can compute one key from the other (usually the encryption and decryption keys are identical), so can't afford to let either key leak out. + Types of Crytographic Systems: 1. (Simple) Substitution: There is a function f(x) which maps each letter of the plaintext (or group of letters) into f(x). f(x) must be 1-1 or one to many. If f(x)=x+1, then called a Caesar Cipher. + Break by using tables of frequencies of letters, doubles, triples, etc. + Mapping "to many" disguises frequency. ex) Substitute the letter by the next letter alphabetically. -clear text: THE QUICK BROWN FOX JUMPS -cipher text: UIF RVJDL CSPXO GPY KVNQT 2. Transposition: Permute (or transpose) the input in blocks to obtain the output. + Break by looking for permutations that ejoin commonly used letter pairs, such as "th". ex) Fill the matrix from left to right, top to bottom, and read the matrix from top to bottom, left to right. -clear text: THE QUICK BROWN FOX JUMPS -cipher text: TUB JHIRFUECOOM KWXP Q N S T H E - Q U I C K - B R O W N - F O X - J U M P S 3. Polyalphabetic Ciphers (Multiple Substitution)- substition cipher, where f(i,x) is a function of i, which is the sequence number of the letter in the text. Typically periodic, i.e. Can get long periods by using two functions with relatively prime periods. + Solved in two steps. 1. look for repeated strings, and count the number of letters between them. Least common denominator of distance between strings is the period. (Or can look at frequencies of letters K apart, until they look ok, then K is period of cipher.) 2. Then solve each of N ciphers separately, using frequency methods. + Old fashioned coding machines (e.g. Hagelin machines) worked as polyalphabetic cipher - had rotating wheels with relatively prime number of cogs. Code was product of path through wheels. 4. Running Key Cipher - use key as long as message - e.g. text of book.(but not random) + Solve: use probably word; substitute it everywhere (i.e. XOR it with the cipher text) and see if a recognizable word pops out. If so, work backward and forward by context. Or, use frequency methods - but frequencies are now products of key and message frequencies, so quite hard. 1.~ 4. are letter substitutions. 5. Codes (Word Substitution) - take linguistic units of input (e.g. words) and use a code book (large table) to map them into output - e.g. letter groups. (Can also encode phrases.) + Hard to solve. Try frequency counts. Also known plaintext method. + Since it's hard to solve, the suggestion is steal the codebook. + Typical approaches: + Use frequency counts, probable words, known plaintext to reduce mapping. + Other approaches: + traffic analysis (can see that there is some message traffic - that is a problem) + Playback - listener plays back legit messages. Confuses people. + Operator error and personnel security. + Biggest problem: key distribution + Any encryption scheme where the effort to break it exceeds the reward can be considered secure. + The only perfect method of encryption is to use a random key as long as the message. All other systems can be broken, given enough of a message. + However, random generator is useless, because it's psuedo-random. + Error control is a major problem - if you drop bits, have big trouble recovering message. + Problem: how to establish secure channel (i.e. distribute keys) in the first place? + Method for Key Distribution + Let KS be the key server. Use key KX to communicate between user X and key server KS. Users A and B. Let "**" mean encryption. Let ID be the message ID. Let KAB be the new key. 1. A to KS: {A,(ID,B)**KA} [A asks server for key to communicate with B] 2. KS to A: {(ID,KAB,(KAB,A)**KB)**KA} [gives the key to A, with a unique ID, so that the message is identifiable.) 3. A to B: {(KAB,A)**KB} [send key to B, so only B can read it.] -Every pair-wise conversation gets a new key. -Individual key between user and KServer. -Key is known only between them.