CS 162 Lecture Notes Prof. Alan Jay Smith Topic: Protection and Security + The purpose of a protection system is to prevent accidental or intentional misuse of a system while permitting controlled sharing. It is relatively easy to provide complete isola- tion. + Accidents: + E.g. program mistakenly overwrites the file containing the command interpreter. Nobody else can log in. Prob- lems of this kind are easy to solve (can do things to make the likelihood small). + You accidently destroy a file you'd like to keep. + Malicious abuse. + Speed dialing of IP addresses + Script Kiddies + In class I'll make jokes, but this is really a serious prob- lem. + Three types of effects we are concerned with: + Unauthorized information modification + Unauthorized denial of use. + Unauthorized information release. + The biggest complication in a general purpose remote accessed computer system is that the intruder in these definitions may - .1 - be an otherwise legitimate user of the system. + Examples of problems - not solved - these are not computer problems: + Fake timesheets for paychecks + Repeat button printer to print extra paychecks + Round off amounts and put into special account. + Make up deposit slips with your account # on them. + Make up checks with your name, but some other account # on them. (Paid out of other account). + Functional Levels of Information Protection + Unprotected System + All or nothing system. (i.e. sharing or complete isola- tion). + The simplest type of protection system is just a user/system distinction. Someone with "system" privileges can do anything. A user can do stuff only permitted user level access. + Obviously, this is unsatisfactory - we need to differentiate between users, and given more fine grained access. + Controlled sharing. + User programmed sharing controllers - e.g. user want to put complex restrictions on use, such as time of day, or concurrence of another user. - .2 - + Design principles for protection mechanisms: + 1) Economy of mechanism: keep the design as simple and small as possible. (KISS - keep it simple, stupid.) + 2) Fail safe defaults - base access decisions on permis- sion, not exclusion. If the system fails, the default is lack of access. + 3) Complete mediation: every access to every object must be checked for authority. + 4) Open design: the design should not be secret. Its effectiveness should not be impaired by knowledge of the mechanism. + 5) Separation of privilege - where feasible, a protection mechanism that requires two keys to unlock it is more robust than one that allows access to the presenter of only one. + 6) Least privilege - give no more than the required ac- cess. + 7) Least common mechanism - minimize the amount of mechanism common to more than one user and depended upon by all users. (These represent the information paths.) + 8) Psychological acceptability - human interface must be convenient! + There are three aspects to a protection mechanism: + Authentication: Confirm that the user is who he says he is. + Authorization determination: must figure out what the - .3 - user is and isn't allowed to do. Need a simple database for this. + Access enforcement: must make sure there are no loo- pholes in the system. Even the slightest flaw in any of these areas may ruin the whole protection mechanism. + Authentication is most often done with passwords. This is a relatively weak form of protection. + A password is a secret piece of information used to es- tablish the identity of a user. + Passwords can be compromised in a number of ways: + Can be stolen (you write it down somewhere - wal- let, address book, front of terminal) + Story of someone who looked in waste baskett of guy who added new passwords. + Story of guy (on CTSS) who logged on at mid- night (when system administrator was on), ex- panded his segment and got copy of password segment of system admin. + Line can be tapped and password copied + Password can be observed as typed in. + Password can be guessed (your name, your mother's name, your project, your birthday). + Password can be repeatedly tried. (Should not be permitted). + Password file can be broken. - .4 - + Counter-Actions + Passwords should not be stored in a directly-readable form. One-way transformations should be used. + Use a one-way function. Given a copy of the password file, can't invert the transformation to get back to the password. (Unix does this. Password file (/etc/passwd) is readable.) + Unix also uses "SALT" to extend the password. Idea is that salt (e.g. user ID) is concatenated to password before encryption - this prevents ef- ficient generation of file of encrypted common passwords for comparison + Password testing should be slow, to discourage machine based tests. + Limit the number of tests. + Passwords should be relatively long and obscure. Para- dox: short passwords are easy to crack, long passwords are easily forgotten and usually written down. + Password testing programs (using common names, words, etc.) often can break more than half of the passwords in the system. + Tell the user when he last logged in - can tell if there was an intruder. + Note: we must protect the authorizer. The program that checks whether the password matches (or encrypts the pass- word) must be incorruptible. - .5 - + (If there is physical security, must be sure guards are incorruptible.) + Another form of identification: badge or key. + Does not have to be kept secret. + Should not be forgeable or copyable. + Can be stolen, but the owner should know if it is. + Pain to carry. + Key paradox: key must be cheap to make, hard to duplicate. This means there must be some trick (i.e. secret) that has to be protected. + Once identification is complete, the system must be sure to protect the identity since other parts of the system will rely on it. + That is, the system must be sure that once a given pro- cess is associated with a given user, that the correspon- dence is retained. + E.g. you don't want to be able to forge mail. + Authorization Determination: must indicate who is allowed to do what with what. + In general, can be represented as an access matrix with one row per user, one column per file. Each entry indi- cates the privileges of that user on that object. In practice, a full access matrix would be too bulky, so it gets stored in one of two condensed ways: - .6 - + access lists or + capabilities. + Access Lists: with each file (or other resource), indicate which users are allowed to perform which operations. + In the most general form, each file has a list of pairs. + It would be tedious to have a separate entry for every user, so they are usually grouped into classes. For ex- ample, in Unix there are three classes: self, group, an- ybody else, and three types of permission (read/write/execute-search) (nine bits per file). + Access lists are simple, and are used in almost all file systems. + Note that the overhead of checking an access list is "high". If access list is in memory, takes at least 50- 100 instructions. If stored on disk, then takes consid- erable time. Can't possibly check access list on every access. E.g. for file, can check for "open", but not for "read" or "write." + Semantically, an access list is equivalent to a guard at the gate - he checks you against a list. + Easy to determine who has access. Easy to revoke access. + Hard to determine what a given user can access. (unless a segregated system, in which capabilities can be found) + Capabilities: with each user, indicates which files may be - .7 - accessed, and in what ways. + Store a list of pairs with each user. This is called a capability list or C-List. + Semantically, a capability is like a key. You can access anything (i.e. open any door) for which you have a key. + In a capability system, you may be able to use capabili- ties as the names for things. The capability is mapped by the system to an object. In this case, you can't even name objects not referred to in your capability list. + Mapping can be like page table. System takes number of entry in list, which may be an address. + Another example is the Open File Table - ref files through the table. + In this case, it is easy to determine what a given process or user can access. + (may be) Hard to determine who has access. + Capabilities can usually be passed along. (e.g. some sort of system call.) + (May be) Hard to revoke access - depends on whether they're segregated in C-lists or not. + Solution is to make a capability indirect through a well known table. + E.g. system open file table + Implementions + Problem is to ensure that capabilities can't be forged. - .8 - + Tagged Architecture - each capability has a tag, which can only be set by the system. Users can mani- pulate capabilities, but not set tag. + Segregated Architecture - capabilities are segregat- ed, and are only touched by the system. User refers to them indirectly (e.g. by C-list.) + Keep capability lists in OS virtual memory. + Real Systems + Intel 432 + Cambridge CAP System + IBM System/38 + In capability-based systems, the default is for no-one to be able to access a file unless they have been given a capability. + Capabilities have sometimes been used in systems that need to be very secure. However, capabilities can make it difficult to share information: nobody can get access to your stuff unless you explicitly give it to them. + Note that overhead of a capability is relatively low. It is possible to use the capability for every access. (Overhead is not zero however, and capability based sys- tems tend to be slow.) + Example of a simple capability-based protection scheme: page tables. Protection is obtained in two ways: + (a) you can only reference stuff in the page tables, and + (b) you can only do what the protection bits permit. - .9 - + Another example - open file table. + In Multics, there are 8 levels (rings) of protection. With each file and segment, there is a level associated with read, write, execute, and call. (A call is special, since it can only go to certain entry points.) + When you attempt to read, write, execute, or call, you generate the address of the target. Generating an ad- dress can involve several steps, if there is any indirec- tion. The effective address has associated with it a level, which is the highest ring that could have modified the address. The protection level of the effective ad- dress is checked against the permitted access on every reference. + Protection levels are associated with segments, and ap- pear in the segment tables. Thus, they can be accessed with no (or little) extra overhead. + A process is permitted to execute at a higher level of authority (lower ring number) if it was entered at a per- mitted entry point. (By entering at a permitted point, called procedure can only be used in permitted ways. It can check its own arguments, etc.) + The Intel X86 architecture supports the same thing, but with 4 levels. + Access enforcement: some part of the system must be respon- sible for enforcing access controls and protecting the au- - .10 - thorization and identification information. + Obviously, this portion of the system must be invoked by everybody, including those planning to cause trouble. Thus it should be as small and simple as possible. Exam- ple: the portion of the system that sets up memory map- ping tables. + The portion of the system that provides and enforces pro- tection is called the security kernel. Most systems, like Unix, let the whole OS run in ``God'' mode. As a consequence, the systems aren't very secure. + Paradox: the more powerful the protection mechanism, the larger and more complex must be the security kernel, and hence the more likely it is to have bugs! + In general, protecting a computer system is extremely diffi- cult. There is no completely secure computer system in ex- istence. Some common problems: + 1. Abuse of valid privileges. On Unix a super-user can do ANYTHING, many people usually know super-user pass- word. Privileges aren't fine grained enough. (Note that in Linux, some exclusions are forced - e.g. to do some- thing to account "X", must "su X" first.) + 2. Imposter or Trojan Horse. Leave a program lying around that looks like the login process or some other standard program. When people type passwords, remember them for your owner. Or, give someone a program that steals information, e.g. supposed editor that reads unau- - .11 - thorized files. Give example of checking account system that credited fractional cents to the account of its creator (``The Superman III Scam'', or ``salami at- tack''). Can be called Trojan Horse - getting a legiti- mate user to unwittingly execute or utilize program code set up by the intruder. + 3. Listener. Eavesdrop on terminal wire, or listen in on local network traffic to steal information. + 4. Spoiler. Use up all resources and make system crash. E.g. grab all file space or create zillions of processes. (This is also called a denial of service at- tack e.g. SYN flood.) + 5. Create doctored version of some standard program (e.g. shell) which gives special privileges to given person. (This can be called a trap door, since it gives special privileges to those who know how to access it.) + Examples of penetration: + If user gets on permission lists of /dev files, s/he gets access to raw I/O devices, esp. disk. There is also a mechanism to get access to raw memory. This gives inap- propriate permissions. + User leaves fake shell on terminal. Someone else tries to log on. Fake shell captures password and then dies. User logs on again. + Dial into a still live dial-up line. + Walk up to terminal that is still logged on. - .12 - + Find account with null password. (Can tell from /etc/passwd which these are.) + Fake distributions - distribute a version of the software with doctored code. + Create a fake file system and have the system mount it. Can put a program there "owned" by the superuser, with setuid bit set. User runs program and becomes superuser. + Page mode buffering - can send commands to a terminal to store the following characters and then send them back. When they are sent back, they are interpreted as being from user at that terminal. (This is very hard to de- feat. Lots of ways to send stuff to a terminal - e.g. mail.) + Log on through uucp as remote system. (System must res- trict what can be done in this case.) + Buffer Overflow - many systems are vulnerable to argument buffers overflowing. (They don't test for length.) + Example of a break-in (at Stanford) + Guest account had password "guest." Got in that way. + Guest was able to write certain scratch file, which was on root search path. (e.g. wrote a file there named "blat" or some such, which was routinely called by root. Search executed intruder's version instead of real one. Gave him root permission.) + Used root to take on identity of other users on the sys- tem. Found one with .rhost files on remote systems, so - .13 - he could log on there. + Repeated the above sequence to move from machine to machine. + Once the system has been penetrated, it may be impossible to secure it again: hooks could have been left around for the imposter to regain control. + It is not always possible to tell when the system has been penetrated, since the villain can clean up all traces behind himself. + Apparently Western Electric denied that it had been robbed even after the guy had admitted it. Note that this is different from human systems, where people remember. + If we can never be sure that there are no bugs, then we can never be sure that the system is secure, since bugs could provide loopholes in the protection mechanisms. + Remind about protection problems. Mention that >>450 billion dollars a day is transferred through EFT. What's a person to do? + Countermeasures: nothing works perfectly, but here are some possibilities: + Logging: record all important actions and uses of - .14 - privilege in an indelible file. Can be used to catch im- posters during their initial attempts and failures. E.g. record all attempts to specify an incorrect password, all super-user logins. Also called audit trail. + But superuser can modify or erase audit trails. + Make audit trial unmodifyable - e.g. hard copy, write once media. + Note that if the system has been broken, the audit trail itself may be vulnerable. + In CS at UCB, audit trail is on file server, separate from user machine. + Even better is to get humans involved at key steps (this is one of the solutions for EFT). + Principle of minimum privilege (``need-to-know'' princi- ple): each piece of the system has access to the minimum amount of information, for the minimum possible amount of time. E.g. file system can't touch memory map, memory manager can't touch disk allocation tables. This reduces the chances of accidental or intentional damage. Capa- bilities are an implementation of this idea. It is very hard to provide fool-proof information containment: e.g. a trojan horse could write characters to a tty, or take page faults, in Morse code, as a signal to another pro- cess. + Correctness proofs. These are very hard to do. Even so, this only proves that the system works according to spec. It doesn't mean that the spec. is necessarily right, and - .15 - it doesn't deal with Trojan Horses. + Callback used to avoid abuse of accounts - at IBM, if an employee dials in, the computer disconnects and calls back. Can only log in from a given home number. + Need extension to network. + For applications systems, e.g. credit cards, can do con- sistency or plausibility check. E.g. is this user spend- ing $10,000 when his largest previous purchase was $100? Is this user spending in Hong Kong on the same day that a transaction was recorded in NY? + Inference Controls + The goal - suppose you want people to be able to get sta- tistical information (e.g. averages) out of a database, but not get individual data. E.g. the average salary of all people living in zip 94720. + System can be designed to answer only such statisti- cal queries, but not individual ones. + The problem - can design sets of queries that will gen- erate individual info. E.g. (a) average salary of all X. (b) average salary of X-delta, where delta describes only one individual. (c) size of X. + These three queries permit us to deduce delta's salary. + No good solution to this problem. + Can do some things: + Randomize data (slightly) - i.e. introduce small - .16 - errors. + Permit only queries on predefined groups - e.g. zip codes. + The Confinement Problem + Problem of mutually suspicious customer and service. Want to insure that the service can only reach informa- tion provided by customer, and that the service is pro- tected from the customer. + Idea is concept of information utility. Idea currently resurfacing as server based software. + Two problems remain: service may not perform as adver- tised, and it may leak - i.e. transmit confidential data. + List of possible leaks: + If the service has memory, it can collect data. + E.g. it can write into a permanent file. + It can write to a temporary file which can be read by the spy. + The service can send a message to a process con- trolled by its owner. + The information can be encoded in the bill rendered for service. + If the file system has interlocks, the service can lock and unlock a file, and the spy can watch to see if the file is locked. Can use like morse code. + The service can vary the paging rate (which affects performance). - .17 - + Viruses + Really only appear in PCs. PCs transfer around execut- able files and code - e.g. in email. + User executes this code, and bad things happen. + Virus usually replicates itself elsewhere + and does something unpleasant to your machine. + General technique is to search for known viruses by look- ing for their object code. + Problem is that viruses encrypt themselves. + Solution is to search for decryption code + Viruses may change the decryption code. + Solution is to interpretively execute the suspected virus code for some portion of time, to see if the code decrypts itself into something that is recognized as common virus. + There is no good defense against an unknown virus, since the code patterns can't be recognized. - .18 - Topic: Encryption + I recommend Kahn, "The Codebreakers". See also Whitfield Diffie and Martin Hellman, "Privacy and Authentication: An Introduction to Cryptography", Proc. IEEE, 67, 3, March, 1979, pp. 397-427. + Popular approach to security in computer systems: encryption. Store and transmit information in an encoded form. + Cryptography - the use of transformation of data intended to make the data useless to one's opponents. + Note that encryption is not new - has been used since times of the Romans - ``Caesar Cipher''. Key1 Key2 V V clear text -> encrypt ---> cipher text ---> decrypt ---> clear text V listener - .19 - + The basic mechanism: + Start with text to be protected. Initial readable text is called clear text. + diagram + Encrypt the clear text so that it doesn't make any sense at all. The nonsense text is called cipher text. The encryption is controlled by a secret password or number; this is called the encryption key. + The encrypted text can be stored in a readable file, or transmitted over unprotected channels. + To make sense of the cipher text, it must be decrypted back into clear text. This is done with some other algo- rithm that uses another secret password or number, called the decryption key. + All of this only works under three conditions: + The encryption function cannot easily be inverted (cannot get back to clear text unless you know the decryption key). + The encryption and decryption must be done in some safe place so the clear text can't be stolen. + The keys must be protected. In most systems, can compute one key from the other (usually the encryption and de- cryption keys are identical), so can't afford to let ei- ther key leak out. + Types of Crytographic Systems: - .20 - + (Simple) Substitution: There is a function f(x) which maps each letter of the plaintext (or group of letters) into f(x). f(x) must be 1-1 or one to many. If f(x)=x+1, then called a Caesar Cipher. + Solved by using tables of frequencies of letters, doubles, triples, etc. + Mapping "to many" disguises frequency. + Transposition: Permute (or transpose) the input in blocks to obtain the output. + Look for permutations that rejoin commonly used letter pairs, such as "th". + Polyalphabetic Ciphers - substition cipher, where f(i,x) is a function of i, which is the sequence number of the letter in the text. Typically periodic in i. Can get long periods by using two functions with relatively prime periods. + Solved in two steps. First look for repeated strings, and count the number of letters between them. Least common denominator of distance between strings is the period. (Or can look at frequencies of letters K apart, until they look ok, then K is period of cipher.) Then solve each of N ciphers separately, using frequency methods. + Old fashioned coding machines (e.g. Hagelin machines) worked as polyalphabetic cipher - had rotating wheels with relatively prime number of cogs. Code was pro- duct of path through wheels. - .21 - + Running Key Cipher - use key as long as message - e.g. text of book. (but not random) + Solve: use probably word; substitute it everywhere (i.e. XOR it with the cipher text) and see if a recognizable word pops out. If so, work backward and forward by context. Or, use frequency methods - but frequencies are now products of key and message fre- quencies, so quite hard. + Codes - take linguistic units of input (e.g. words) and use a code book (large table) to map them into output - e.g. letter groups. (Can also encode phrases.) + Hard to solve. Try frequency counts. Also known plaintext method. + Typical approaches: frequency counts, probable words, known plaintext. + Other approaches: + traffic analysis ( can see that there is some message traffic - that is a problem) + Playback - listener plays back legit messages. Confuses people. + Operator error and personnel security. + In most systems, key distribution is the biggest weakness. + Any encryption scheme where the effort to break it exceeds the reward can be considered secure. + The only perfect method of encryption is to use a random - .22 - key as long as the message. All other systems can be broken, given enough of a message. + Error control is a major problem - if you drop bits, have big trouble recovering message. + Problem: how to establish secure channel (i.e. distribute keys) in the first place? + Method for Key Distribution + Let KS be the key server. Use key KX to communicate between user X and key server KS. Users A and B. Let "**" mean encryption. Let ID be the message ID. Let KAB be the new key. + 1. A to KS: {A,(ID,B)**KA} [A asks server for key to com- municate with B] + 2. KS to A: {(ID,KAB,(KAB,A)**KB)**KA} [gives the key to A, with a unique ID, so that the message is identifi- able.) + 3. A to B: {(KAB,A)**KB} [send key to B, so only B can read it.] + Federal data encryption standard (DES). Can be implemented efficiently in hardware and appears to be relatively safe. + Block cipher. Encrypts 64 bits at a time. Uses 56 bit key. + Tell story about the NSA and 56-bit keys. Note that US - .23 - Govt. doesn't want cheap and effective encryption - would no longer be able to read third world traffic. + There are chips that encrypt/decrypt megabits per second. + DES no longer considered safe enough by NSA. The latest standard is the CLIPPER chip. For practical purposes, DES more than adequate. + Sufficient security is obtained by two level encryp- tion in pairs. + Export control (ITAR (international traffic in arms regu- lations) on DES products. + PGP - pretty good privacy - public domain encryption system. Based on DES. + Public key encryption: new mechanism for encryption where knowing the encryption key doesn't help you to find decryp- tion key, or vice versa. + There are two keys, which are inverses of each other. Whatever is encoded with one can be decoded with the oth- er. ("private key", "public key") + Two keys are not derivable from each other. + Each user keeps one key secret, publicizes the other. Can't derive private key from public key. Public keys are made available to everyone, in a phone book for exam- ple. + Specific scheme for public key encryption (pages 471-472, - .24 - chap 14, of Silberschatz and Galvin): + Encode: E(m)=(m**e) mod n = C + e and n are public; d is private. + Decode: D(C) = (C**d) mod n + where: m is message, e, d, are between 0 and n-1; e, d, n positive integers. + Must derive e, d, and n such that the above decode is in- verse of encode. + Let n=p*q (p, q large primes). + d is large integer relatively prime to (p-1)*(q-1) (i.e. GCD[d, (p-1)*(q-1)] == 1 + e is chosen such that (e*d) mod ((p-1)*(q-1)) ==1 + ICBS that this makes E and D inverses. + This is safe because although n is known, p & q are not known, and so e cannot be derived. (factoring is known to be hard.) + proof requires number theory which I haven't bothered to look up. + Safe mail: + Use public key of destination user to encrypt mail. + Anybody can encrypt mail for this user and be certain that only the user will be able to decipher it. It's a nice scheme because the user only has to remember one key, and all senders can use the same key. However, how does receiver know for sure who it's getting mail from? - .25 - + Digital signatures: can also use public keys to certify iden- tity: + To certify your identity, use your private key to encrypt a text message, e.g. ``I agree to pay Mary Wallace $100 per year for the duration of life.'' + You can give the encrypted message to anybody, and they can certify that it came from you by seeing if it de- crypts with your public key. Anything that decrypts into readable text with your public key must have come from you! This can be made legally binding as a form of elec- tronic signature. + Note that only encrypting with your private key permits the mail or message to be read by anyone. + If you encrypt with your private key, and then some- one else's public key, it can only be read by intend- ed recipient. + One public key method believed to work: Publish a large com- posite number (public key). Private key is factors of the number. Factors hard to obtain. + Encryption appears to be a great way to thwart listeners. It doesn't help with Trojan Horses, though. + One Way Encryption - use to encrypt password file. Don't have to be able to decrypt it - just compare encryption of submit- ted password with stored one. Can't deduce what needs to be - .26 - submitted. (I.e. encryption algorithm should not be invert- able.) + General problem: how do we know that an encryption mechanism is safe? It's extremely hard to prove. Mention example of scheme that was recently disproven after being widely accept- ed - knapsack problem. This is a hot topic for research: theorists are trying to find provably hard problems, and use them for proving safety of encryption. + CLIPPER Chip + Chip Contains: + 64-bit block encryption (algorithm classified) + Uses 80 bit keys. + Uses 32 rounds of scrambling (compared to 16 for DES) + Uses the following numbers: + F - 80-bit key used by all Clipper chips + N - 30-bit serial number (per chip) + U - 80-bit secret decryption key for this chip only. + Secure conversation occurs this way: + Session key K is negotiated (somehow). + E(M;K) is encrypted message stream. + E(E(K;U), N; F) is a "law enforcement block". With F, we can get E(K;U),N. From N, (the serial number), we can get U (held by federal agencies), and .27 - then can get K. From K, we can decrypt messages. + Key U is xor of U1 and U2. U1 and U2 held by different federal agencies. Can get both U1 and U2 only with court ordered wiretap.