CS 162 Lecture Notes Prof. Alan Jay Smith Topic: Protection and Security + The purpose of a protection system is to prevent accidental or intentional misuse of a system while permitting controlled sharing. It is relatively easy to provide complete isola- tion. + Accidents: + E.g. program mistakenly overwrites the file containing the command interpreter. Nobody else can log in. Prob- lems of this kind are easy to solve (can do things to make the likelihood small). + You accidently destroy a file you'd like to keep. + Malicious abuse. + Speed dialing of IP addresses + Script Kiddies + In class I'll make jokes, but this is really a serious prob- lem. + Three types of effects we are concerned with: + Unauthorized information modification + Unauthorized denial of use. + Unauthorized information release. + The biggest complication in a general purpose remote accessed computer system is that the intruder in these definitions may - .1 - be an otherwise legitimate user of the system. + Examples of problems - not solved - these are not computer problems: + Fake timesheets for paychecks + Repeat button printer to print extra paychecks + Round off amounts and put into special account. + Make up deposit slips with your account # on them. + Make up checks with your name, but some other account # on them. (Paid out of other account). + Functional Levels of Information Protection + Unprotected System + All or nothing system. (i.e. sharing or complete isola- tion). + The simplest type of protection system is just a user/system distinction. Someone with "system" privileges can do anything. A user can do stuff only permitted user level access. + Obviously, this is unsatisfactory - we need to differentiate between users, and given more fine grained access. + Controlled sharing. + User programmed sharing controllers - e.g. user want to put complex restrictions on use, such as time of day, or concurrence of another user. - .2 - + Design principles for protection mechanisms: + 1) Economy of mechanism: keep the design as simple and small as possible. (KISS - keep it simple, stupid.) + 2) Fail safe defaults - base access decisions on permis- sion, not exclusion. If the system fails, the default is lack of access. + 3) Complete mediation: every access to every object must be checked for authority. + 4) Open design: the design should not be secret. Its effectiveness should not be impaired by knowledge of the mechanism. + 5) Separation of privilege - where feasible, a protection mechanism that requires two keys to unlock it is more robust than one that allows access to the presenter of only one. + 6) Least privilege - give no more than the required ac- cess. + 7) Least common mechanism - minimize the amount of mechanism common to more than one user and depended upon by all users. (These represent the information paths.) + 8) Psychological acceptability - human interface must be convenient! + There are three aspects to a protection mechanism: + Authentication: Confirm that the user is who he says he is. + Authorization determination: must figure out what the - .3 - user is and isn't allowed to do. Need a simple database for this. + Access enforcement: must make sure there are no loo- pholes in the system. Even the slightest flaw in any of these areas may ruin the whole protection mechanism. + Authentication is most often done with passwords. This is a relatively weak form of protection. + A password is a secret piece of information used to es- tablish the identity of a user. + Passwords can be compromised in a number of ways: + Can be stolen (you write it down somewhere - wal- let, address book, front of terminal) + Story of someone who looked in waste baskett of guy who added new passwords. + Story of guy (on CTSS) who logged on at mid- night (when system administrator was on), ex- panded his segment and got copy of password segment of system admin. + Line can be tapped and password copied + Password can be observed as typed in. + Password can be guessed (your name, your mother's name, your project, your birthday). + Password can be repeatedly tried. (Should not be permitted). + Password file can be broken. - .4 - + Counter-Actions + Passwords should not be stored in a directly-readable form. One-way transformations should be used. + Use a one-way function. Given a copy of the password file, can't invert the transformation to get back to the password. (Unix does this. Password file (/etc/passwd) is readable.) + Unix also uses "SALT" to extend the password. Idea is that salt (e.g. user ID) is concatenated to password before encryption - this prevents ef- ficient generation of file of encrypted common passwords for comparison + Password testing should be slow, to discourage machine based tests. + Limit the number of tests. + Passwords should be relatively long and obscure. Para- dox: short passwords are easy to crack, long passwords are easily forgotten and usually written down. + Password testing programs (using common names, words, etc.) often can break more than half of the passwords in the system. + Tell the user when he last logged in - can tell if there was an intruder. + Note: we must protect the authorizer. The program that checks whether the password matches (or encrypts the pass- word) must be incorruptible. - .5 - + (If there is physical security, must be sure guards are incorruptible.) + Another form of identification: badge or key. + Does not have to be kept secret. + Should not be forgeable or copyable. + Can be stolen, but the owner should know if it is. + Pain to carry. + Key paradox: key must be cheap to make, hard to duplicate. This means there must be some trick (i.e. secret) that has to be protected. + Once identification is complete, the system must be sure to protect the identity since other parts of the system will rely on it. + That is, the system must be sure that once a given pro- cess is associated with a given user, that the correspon- dence is retained. + E.g. you don't want to be able to forge mail. + Authorization Determination: must indicate who is allowed to do what with what. + In general, can be represented as an access matrix with one row per user, one column per file. Each entry indi- cates the privileges of that user on that object. In practice, a full access matrix would be too bulky, so it gets stored in one of two condensed ways: - .6 - + access lists or + capabilities. + Access Lists: with each file (or other resource), indicate which users are allowed to perform which operations. + In the most general form, each file has a list of pairs. + It would be tedious to have a separate entry for every user, so they are usually grouped into classes. For ex- ample, in Unix there are three classes: self, group, an- ybody else, and three types of permission (read/write/execute-search) (nine bits per file). + Access lists are simple, and are used in almost all file systems. + Note that the overhead of checking an access list is "high". If access list is in memory, takes at least 50- 100 instructions. If stored on disk, then takes consid- erable time. Can't possibly check access list on every access. E.g. for file, can check for "open", but not for "read" or "write." + Semantically, an access list is equivalent to a guard at the gate - he checks you against a list. + Easy to determine who has access. Easy to revoke access. + Hard to determine what a given user can access. (unless a segregated system, in which capabilities can be found) + Capabilities: with each user, indicates which files may be - .7 - accessed, and in what ways. + Store a list of pairs with each user. This is called a capability list or C-List. + Semantically, a capability is like a key. You can access anything (i.e. open any door) for which you have a key. + In a capability system, you may be able to use capabili- ties as the names for things. The capability is mapped by the system to an object. In this case, you can't even name objects not referred to in your capability list. + Mapping can be like page table. System takes number of entry in list, which may be an address. + Another example is the Open File Table - ref files through the table. + In this case, it is easy to determine what a given process or user can access. + (may be) Hard to determine who has access. + Capabilities can usually be passed along. (e.g. some sort of system call.) + (May be) Hard to revoke access - depends on whether they're segregated in C-lists or not. + Solution is to make a capability indirect through a well known table. + E.g. system open file table + Implementions + Problem is to ensure that capabilities can't be forged. - .8 - + Tagged Architecture - each capability has a tag, which can only be set by the system. Users can mani- pulate capabilities, but not set tag. + Segregated Architecture - capabilities are segregat- ed, and are only touched by the system. User refers to them indirectly (e.g. by C-list.) + Keep capability lists in OS virtual memory. + Real Systems + Intel 432 + Cambridge CAP System + IBM System/38 + In capability-based systems, the default is for no-one to be able to access a file unless they have been given a capability. + Capabilities have sometimes been used in systems that need to be very secure. However, capabilities can make it difficult to share information: nobody can get access to your stuff unless you explicitly give it to them. + Note that overhead of a capability is relatively low. It is possible to use the capability for every access. (Overhead is not zero however, and capability based sys- tems tend to be slow.) + Example of a simple capability-based protection scheme: page tables. Protection is obtained in two ways: + (a) you can only reference stuff in the page tables, and + (b) you can only do what the protection bits permit. - .9 - + Another example - open file table. + In Multics, there are 8 levels (rings) of protection. With each file and segment, there is a level associated with read, write, execute, and call. (A call is special, since it can only go to certain entry points.) + When you attempt to read, write, execute, or call, you generate the address of the target. Generating an ad- dress can involve several steps, if there is any indirec- tion. The effective address has associated with it a level, which is the highest ring that could have modified the address. The protection level of the effective ad- dress is checked against the permitted access on every reference. + Protection levels are associated with segments, and ap- pear in the segment tables. Thus, they can be accessed with no (or little) extra overhead. + A process is permitted to execute at a higher level of authority (lower ring number) if it was entered at a per- mitted entry point. (By entering at a permitted point, called procedure can only be used in permitted ways. It can check its own arguments, etc.) + The Intel X86 architecture supports the same thing, but with 4 levels. + Access enforcement: some part of the system must be respon- sible for enforcing access controls and protecting the au- - .10 - thorization and identification information. + Obviously, this portion of the system must be invoked by everybody, including those planning to cause trouble. Thus it should be as small and simple as possible. Exam- ple: the portion of the system that sets up memory map- ping tables. + The portion of the system that provides and enforces pro- tection is called the security kernel. Most systems, like Unix, let the whole OS run in ``God'' mode. As a consequence, the systems aren't very secure. + Paradox: the more powerful the protection mechanism, the larger and more complex must be the security kernel, and hence the more likely it is to have bugs! + In general, protecting a computer system is extremely diffi- cult. There is no completely secure computer system in ex- istence. Some common problems: + 1. Abuse of valid privileges. On Unix a super-user can do ANYTHING, many people usually know super-user pass- word. Privileges aren't fine grained enough. (Note that in Linux, some exclusions are forced - e.g. to do some- thing to account "X", must "su X" first.) + 2. Imposter or Trojan Horse. Leave a program lying around that looks like the login process or some other standard program. When people type passwords, remember them for your owner. Or, give someone a program that steals information, e.g. supposed editor that reads unau- - .11 - thorized files. Give example of checking account system that credited fractional cents to the account of its creator (``The Superman III Scam'', or ``salami at- tack''). Can be called Trojan Horse - getting a legiti- mate user to unwittingly execute or utilize program code set up by the intruder. + 3. Listener. Eavesdrop on terminal wire, or listen in on local network traffic to steal information. + 4. Spoiler. Use up all resources and make system crash. E.g. grab all file space or create zillions of processes. (This is also called a denial of service at- tack e.g. SYN flood.) + 5. Create doctored version of some standard program (e.g. shell) which gives special privileges to given person. (This can be called a trap door, since it gives special privileges to those who know how to access it.) + Examples of penetration: + If user gets on permission lists of /dev files, s/he gets access to raw I/O devices, esp. disk. There is also a mechanism to get access to raw memory. This gives inap- propriate permissions. + User leaves fake shell on terminal. Someone else tries to log on. Fake shell captures password and then dies. User logs on again. + Dial into a still live dial-up line. + Walk up to terminal that is still logged on. - .12 - + Find account with null password. (Can tell from /etc/passwd which these are.) + Fake distributions - distribute a version of the software with doctored code. + Create a fake file system and have the system mount it. Can put a program there "owned" by the superuser, with setuid bit set. User runs program and becomes superuser. + Page mode buffering - can send commands to a terminal to store the following characters and then send them back. When they are sent back, they are interpreted as being from user at that terminal. (This is very hard to de- feat. Lots of ways to send stuff to a terminal - e.g. mail.) + Log on through uucp as remote system. (System must res- trict what can be done in this case.) + Buffer Overflow - many systems are vulnerable to argument buffers overflowing. (They don't test for length.) + Example of a break-in (at Stanford) + Guest account had password "guest." Got in that way. + Guest was able to write certain scratch file, which was on root search path. (e.g. wrote a file there named "blat" or some such, which was routinely called by root. Search executed intruder's version instead of real one. Gave him root permission.) + Used root to take on identity of other users on the sys- tem. Found one with .rhost files on remote systems, so - .13 - he could log on there. + Repeated the above sequence to move from machine to machine. + Once the system has been penetrated, it may be impossible to secure it again: hooks could have been left around for the imposter to regain control. + It is not always possible to tell when the system has been penetrated, since the villain can clean up all traces behind himself. + Apparently Western Electric denied that it had been robbed even after the guy had admitted it. Note that this is different from human systems, where people remember. + If we can never be sure that there are no bugs, then we can never be sure that the system is secure, since bugs could provide loopholes in the protection mechanisms. + Remind about protection problems. Mention that >>450 billion dollars a day is transferred through EFT. What's a person to do? + Countermeasures: nothing works perfectly, but here are some possibilities: + Logging: record all important actions and uses of - .14 - privilege in an indelible file. Can be used to catch im- posters during their initial attempts and failures. E.g. record all attempts to specify an incorrect password, all super-user logins. Also called audit trail. + But superuser can modify or erase audit trails. + Make audit trial unmodifyable - e.g. hard copy, write once media. + Note that if the system has been broken, the audit trail itself may be vulnerable. + In CS at UCB, audit trail is on file server, separate from user machine. + Even better is to get humans involved at key steps (this is one of the solutions for EFT). + Principle of minimum privilege (``need-to-know'' princi- ple): each piece of the system has access to the minimum amount of information, for the minimum possible amount of time. E.g. file system can't touch memory map, memory manager can't touch disk allocation tables. This reduces the chances of accidental or intentional damage. Capa- bilities are an implementation of this idea. It is very hard to provide fool-proof information containment: e.g. a trojan horse could write characters to a tty, or take page faults, in Morse code, as a signal to another pro- cess. + Correctness proofs. These are very hard to do. Even so, this only proves that the system works according to spec. It doesn't mean that the spec. is necessarily right, and - .15 - it doesn't deal with Trojan Horses. + Callback used to avoid abuse of accounts - at IBM, if an employee dials in, the computer disconnects and calls back. Can only log in from a given home number. + Need extension to network. + For applications systems, e.g. credit cards, can do con- sistency or plausibility check. E.g. is this user spend- ing $10,000 when his largest previous purchase was $100? Is this user spending in Hong Kong on the same day that a transaction was recorded in NY? + Inference Controls + The goal - suppose you want people to be able to get sta- tistical information (e.g. averages) out of a database, but not get individual data. E.g. the average salary of all people living in zip 94720. + System can be designed to answer only such statisti- cal queries, but not individual ones. + The problem - can design sets of queries that will gen- erate individual info. E.g. (a) average salary of all X. (b) average salary of X-delta, where delta describes only one individual. (c) size of X. + These three queries permit us to deduce delta's salary. + No good solution to this problem. + Can do some things: + Randomize data (slightly) - i.e. introduce small - .16 - errors. + Permit only queries on predefined groups - e.g. zip codes. + The Confinement Problem + Problem of mutually suspicious customer and service. Want to insure that the service can only reach informa- tion provided by customer, and that the service is pro- tected from the customer. + Idea is concept of information utility. Idea currently resurfacing as server based software. + Two problems remain: service may not perform as adver- tised, and it may leak - i.e. transmit confidential data. + List of possible leaks: + If the service has memory, it can collect data. + E.g. it can write into a permanent file. + It can write to a temporary file which can be read by the spy. + The service can send a message to a process con- trolled by its owner. + The information can be encoded in the bill rendered for service. + If the file system has interlocks, the service can lock and unlock a file, and the spy can watch to see if the file is locked. Can use like morse code. + The service can vary the paging rate (which affects performance). - .17 - + Viruses + Really only appear in PCs. PCs transfer around execut- able files and code - e.g. in email. + User executes this code, and bad things happen. + Virus usually replicates itself elsewhere + and does something unpleasant to your machine. + General technique is to search for known viruses by look- ing for their object code. + Problem is that viruses encrypt themselves. + Solution is to search for decryption code + Viruses may change the decryption code. + Solution is to interpretively execute the suspected virus code for some portion of time, to see if the code decrypts itself into something that is recognized as common virus. + There is no good defense against an unknown virus, since the code patterns can't be recognized. - .18 - Topic: Encryption + I recommend Kahn, "The Codebreakers". See also Whitfield Diffie and Martin Hellman, "Privacy and Authentication: An Introduction to Cryptography", Proc. IEEE, 67, 3, March, 1979, pp. 397-427. + Popular approach to security in computer systems: encryption. Store and transmit information in an encoded form. + Cryptography - the use of transformation of data intended to make the data useless to one's opponents. + Note that encryption is not new - has been used since times of the Romans - ``Caesar Cipher''. Key1 Key2 V V clear text -> encrypt ---> cipher text ---> decrypt ---> clear text V listener - .19 - + The basic mechanism: + Start with text to be protected. Initial readable text is called clear text. + diagram + Encrypt the clear text so that it doesn't make any sense at all. The nonsense text is called cipher text. The encryption is controlled by a secret password or number; this is called the encryption key. + The encrypted text can be stored in a readable file, or transmitted over unprotected channels. + To make sense of the cipher text, it must be decrypted back into clear text. This is done with some other algo- rithm that uses another secret password or number, called the decryption key. + All of this only works under three conditions: + The encryption function cannot easily be inverted (cannot get back to clear text unless you know the decryption key). + The encryption and decryption must be done in some safe place so the clear text can't be stolen. + The keys must be protected. In most systems, can compute one key from the other (usually the encryption and de- cryption keys are identical), so can't afford to let ei- ther key leak out. + Types of Crytographic Systems: - .20 - + (Simple) Substitution: There is a function f(x) which maps each letter of the plaintext (or group of letters) into f(x). f(x) must be 1-1 or one to many. If f(x)=x+1, then called a Caesar Cipher. + Solved by using tables of frequencies of letters, doubles, triples, etc. + Mapping "to many" disguises frequency. + Transposition: Permute (or transpose) the input in blocks to obtain the output. + Look for permutations that rejoin commonly used letter pairs, such as "th". + Polyalphabetic Ciphers - substition cipher, where f(i,x) is a function of i, which is the sequence number of the letter in the text. Typically periodic in i. Can get long periods by using two functions with relatively prime periods. + Solved in two steps. First look for repeated strings, and count the number of letters between them. Least common denominator of distance between strings is the period. (Or can look at frequencies of letters K apart, until they look ok, then K is period of cipher.) Then solve each of N ciphers separately, using frequency methods. + Old fashioned coding machines (e.g. Hagelin machines) worked as polyalphabetic cipher - had rotating wheels with relatively prime number of cogs. Code was pro- duct of path through wheels. - .21 - + Running Key Cipher - use key as long as message - e.g. text of book. (but not random) + Solve: use probably word; substitute it everywhere (i.e. XOR it with the cipher text) and see if a recognizable word pops out. If so, work backward and forward by context. Or, use frequency methods - but frequencies are now products of key and message fre- quencies, so quite hard. + Codes - take linguistic units of input (e.g. words) and use a code book (large table) to map them into output - e.g. letter groups. (Can also encode phrases.) + Hard to solve. Try frequency counts. Also known plaintext method. + Typical approaches: frequency counts, probable words, known plaintext. + Other approaches: + traffic analysis ( can see that there is some message traffic - that is a problem) + Playback - listener plays back legit messages. Confuses people. + Operator error and personnel security. + In most systems, key distribution is the biggest weakness. + Any encryption scheme where the effort to break it exceeds the reward can be considered secure. + The only perfect method of encryption is to use a random - .22 - key as long as the message. All other systems can be broken, given enough of a message. + Error control is a major problem - if you drop bits, have big trouble recovering message. + Problem: how to establish secure channel (i.e. distribute keys) in the first place? + Method for Key Distribution + Let KS be the key server. Use key KX to communicate between user X and key server KS. Users A and B. Let "**" mean encryption. Let ID be the message ID. Let KAB be the new key. + 1. A to KS: {A,(ID,B)**KA} [A asks server for key to com- municate with B] + 2. KS to A: {(ID,KAB,(KAB,A)**KB)**KA} [gives the key to A, with a unique ID, so that the message is identifi- able.) + 3. A to B: {(KAB,A)**KB} [send key to B, so only B can read it.] + Federal data encryption standard (DES). Can be implemented efficiently in hardware and appears to be relatively safe. + Block cipher. Encrypts 64 bits at a time. Uses 56 bit key. + Tell story about the NSA and 56-bit keys. Note that US - .23 - Govt. doesn't want cheap and effective encryption - would no longer be able to read third world traffic. + There are chips that encrypt/decrypt megabits per second. + DES no longer considered safe enough by NSA. The latest standard is the CLIPPER chip. For practical purposes, DES more than adequate. + Sufficient security is obtained by two level encryp- tion in pairs. + Export control (ITAR (international traffic in arms regu- lations) on DES products. + PGP - pretty good privacy - public domain encryption system. Based on DES. + Public key encryption: new mechanism for encryption where knowing the encryption key doesn't help you to find decryp- tion key, or vice versa. + There are two keys, which are inverses of each other. Whatever is encoded with one can be decoded with the oth- er. ("private key", "public key") + Two keys are not derivable from each other. + Each user keeps one key secret, publicizes the other. Can't derive private key from public key. Public keys are made available to everyone, in a phone book for exam- ple. + Specific scheme for public key encryption (pages 471-472, - .24 - chap 14, of Silberschatz and Galvin): + Encode: E(m)=(m**e) mod n = C + e and n are public; d is private. + Decode: D(C) = (C**d) mod n + where: m is message, e, d, are between 0 and n-1; e, d, n positive integers. + Must derive e, d, and n such that the above decode is in- verse of encode. + Let n=p*q (p, q large primes). + d is large integer relatively prime to (p-1)*(q-1) (i.e. GCD[d, (p-1)*(q-1)] == 1 + e is chosen such that (e*d) mod ((p-1)*(q-1)) ==1 + ICBS that this makes E and D inverses. + This is safe because although n is known, p & q are not known, and so e cannot be derived. (factoring is known to be hard.) + proof requires number theory which I haven't bothered to look up. + Safe mail: + Use public key of destination user to encrypt mail. + Anybody can encrypt mail for this user and be certain that only the user will be able to decipher it. It's a nice scheme because the user only has to remember one key, and all senders can use the same key. However, how does receiver know for sure who it's getting mail from? - .25 - + Digital signatures: can also use public keys to certify iden- tity: + To certify your identity, use your private key to encrypt a text message, e.g. ``I agree to pay Mary Wallace $100 per year for the duration of life.'' + You can give the encrypted message to anybody, and they can certify that it came from you by seeing if it de- crypts with your public key. Anything that decrypts into readable text with your public key must have come from you! This can be made legally binding as a form of elec- tronic signature. + Note that only encrypting with your private key permits the mail or message to be read by anyone. + If you encrypt with your private key, and then some- one else's public key, it can only be read by intend- ed recipient. + One public key method believed to work: Publish a large com- posite number (public key). Private key is factors of the number. Factors hard to obtain. + Encryption appears to be a great way to thwart listeners. It doesn't help with Trojan Horses, though. + One Way Encryption - use to encrypt password file. Don't have to be able to decrypt it - just compare encryption of submit- ted password with stored one. Can't deduce what needs to be - .26 - submitted. (I.e. encryption algorithm should not be invert- able.) + General problem: how do we know that an encryption mechanism is safe? It's extremely hard to prove. Mention example of scheme that was recently disproven after being widely accept- ed - knapsack problem. This is a hot topic for research: theorists are trying to find provably hard problems, and use them for proving safety of encryption. + CLIPPER Chip + Chip Contains: + 64-bit block encryption (algorithm classified) + Uses 80 bit keys. + Uses 32 rounds of scrambling (compared to 16 for DES) + Uses the following numbers: + F - 80-bit key used by all Clipper chips + N - 30-bit serial number (per chip) + U - 80-bit secret decryption key for this chip only. + Secure conversation occurs this way: + Session key K is negotiated (somehow). + E(M;K) is encrypted message stream. + E(E(K;U), N; F) is a "law enforcement block". With F, we can get E(K;U),N. From N, (the serial number), we can get U (held by federal agencies), and - .27 - then can get K. From K, we can decrypt messages. + Key U is xor of U1 and U2. U1 and U2 held by different federal agencies. Can get both U1 and U2 only with court ordered wiretap. - .28 - Topic: Performance Evaluation + Suggested reading: Heidelberger and Lavenberg, "Computer Per- formance Evaluation Methodology", IEEETC, C33, 12, December, 1984, pp. 1195-... + Performance modeling and measurement is needed through the entire life cycle of a system: design, debugging, installa- tion, tuning, data collection for next system. + Performance evaluation covers these areas: + Measurement + Analytic Modeling + Simulation Modeling + Tuning + Design Improvement + Good work in performance evaluation requires a good substan- tive understanding of the system under study. You can't just arrive with a bag of tricks (e.g. queueing theory) and do something useful. + Measurement + Advantage of dealing with something "real." Has all in- teractions, which would tend to escape the model. + Disadvantage of time and effort to get data (inc. facili- ties needed). - .29 - + Not that many published measurement studies. + Hardware Monitoring + Use some sort of hardware monitor (e.g. logic analyzer) to collect and partially reduce data. + Data is pulled off system with logic probes. + Note that signals have to be available. (Not from middle of chip). May have to design in probe points. + Signals can be used to count events, generate traces, sample, etc. + Hardware monitors typically have some hard logic (e.g. counters and gates) backed up by some programmable con- trol (e.g. a minicomputer). + HW monitors are difficult to get, expensive, and hard to use. Seldom used except by vendors and large computer centers. + Samples should be taken at random times, not regular in- tervals. The latter will fail to get correct measures of events which occur at regular intervals which are multi- ples or submultiples of the sampling interval. + Software Measurements + Code running on system is instrumented. + Can put counters or signallers in OS code, compile it into source code to be studied. Can sample at timer in- terrupts (e.g. PC). + Note that sampling won't sample something which isn't - .30 - interruptable. + Can use built in profiling facilities provided by some compilers. + Can instrument the microcode to collect data, if machine is microcoded. + Automatic facilities like compilers and profilers can do much of this. + Can generate significant overhead - e.g. 20% or more. (GTF) + Hardware Counters + Modern CPUs have counters built into the hardware. Can be set to count various things: branches, misses, cycles, instructions, etc. + Multics Measurements + Sampled using timer interrupts. + Used hardware counter for memory cycles. + Had external I/O channel which could be externally driven to do useful things like read certain regions of memory. + Used remote terminal emulator to drive system. + Diamond + Diamond is an internal DEC measurement tool - a hybrid monitor. Hardware probes read the PC, CPU mode, channel and I/O device activity, and system task ID. Software gets the user ID. A minicomputer reads the data and does - .31 - real time and later analysis. Can generate traces also. + IBM's GTF (General Trace Facility) + IBM - generates trace records of any system events - e.g. I/O interrupts, SIOs, opens, traps, SVCs, dispatches, closes, etc. + Designed for debugging. Generates lots of useless data, and not as much useful data as you would like. + IBM's SMF (System Management Facility) + Generates records for assorted events - jobs, tasks, opens, closes, etc. + Designed for accounting and management. Also gen- erates lots of useless data, and not as much useful data as you would like. + Some mainframes have console monitors which generate real time load information and measurements (e.g. queue lengths, channel utilization, etc.) + Workload Characterization + Important part of any workload study - must know what the workload is. + Three types of workloads for Performance Experiments: + Live workloads - good for measurement but poor for experiments- - uncontrolled. + Executable workloads consisting of real samples. - .32 - + Synthetic executable workloads. - can be parameter- ized versions of real workloads. + A synthetic workload may be needed as a projec- tion of a future workload. + To characterize a workload: + Decide what to characterize. (not easy - as you can see, many published papers are not interesting) + Figure out how to characterize those items + Figure out how to get the data (!!!) + Get the data. + Exploratory data analysis. + Cluster analysis + Statistical Methods + Means, variances, distributions + Techniques such as linear regression and factor analysis quite useful. + Can do statistical analysis on data to see if various models fit. + Seldom used - little intersection between class of com- petant experimental performance analysts and competent statisticians. + Analytic Performance Modeling + Build analytic model of some type of system of interest. + Calculate factors of interest as function of parameters - .33 - + Lots of research of this type. Some of it useful. + Models tend to be queueing models, stochastic process models. + Most of progress in queueing theory in last 30-40 years due to computer system modeling. + Advances are: + class of queueing network models which are easily solved, + efficient computational algorithms for these solutions, + good approximation methods for systems not easily solved. + Pros/Cons of Analytic Modeling + They are good for capacity planning, I/O subsystem model- ing, preliminary design aid. + Capacity planning is a big area of application - measure and analyze current system, set up validated model of current system and workload, project changes in workload, and see what sort of system design will handle it. + Analytic models do not capture the fine level of detail needed for some things such as hardware design and analysis - e.g. caches, CPU pipelines. + Queueing networks are powerful technique in analytic model- ing. - .34 - + Major Components of Queueing Network are: + Servers + Customers + Routing + Diagram of queueing network. + Many types of queueing networks can be easily solved, such as the following (called "BCMP" - Baskett Chandy Muntz & Palacios): + Customers can have a type T. + Routing is system can be of form p(i,t1,j,t2), where i and j are servers, and t1 and t2 are types. + Servers can be + FCFS exponential, with service rate a function of queue length + Processor sharing, any distribution + Infinite server - any distribution. + LCFSPR - last come, first serve, preemptive resume, any distribution + Solution is product of terms for each service sta- tion. (See JACM, 1975.) + If network is not of BCMP type, probably can't be solved exactly. There are approximation methods that can be used. + Simulation + Types: - .35 - + discrete event simulation + Trace Driven + Random Number (Stochastic) + continuous simulation (e.g. a diff eq) (not used for computer system modeling) + Monte carlo methods (e.g. sampling) + Simulation model has these components: + A model of the system, which has a state. + A set of events which cause changes in the state. + A method for generating such a sequence of events. + A measurement component, which records the statistics of interest. + Discrete Event general simulation model: + Events come from event list. (Events can come from trace). + Next event is taken off list. + System state is updated. + Event list is updated. (events added, deleted, their times changed) + Statistics are accumulated. + Next event is obtained from event list. + Example: simple discrete event simulation of M/G/1 queue. + Events come from trace and/or random number generator. + There are special languages for simulation (GPSS, SIMSCRIPT, - .36 - GASP, SLAM, SIMULA) + There are simulation modeling packages: RESQ (IBM), PAWS (UT Austin), QNAP (INRIA - Potier) + Analysis of simulation output + Regenerative simulation - find regeneration point, do standard IID statistics + Time Series Analysis - do time series analysis (e.g. au- tocorrelation anal) - also called Spectral method. + Repeat entire simulation run + Very long runs + Batch means - take long samples, and consider them IID + Back of the Envelope methods - .37 - Topic: Virtual Machines Reading: Robert Goldberg, "Survey of Virtual Machine Research", IEEE Computer, June, 1974, pp. 34-45. + A virtual machine is a software supported copy of the basic (hardware) machine. + Usually accomplished by allowing most instructions to run on real hardware. + Virtual Machine Monitor (VMM) - the piece of software that provides the pseudo-bare machine interfaces. + Diagram of virtual machine idea + Virtual machines run on base machines that are the same. + Emulators are used to provide dissimilar bare machine inter- face (i.e. different than the machine underneath). + Contrasts with operating systems, which provide extended machine interfaces - i.e. they are presumably better than the bare machine. + Uses of virtual machines + Run several different OS's at once (for different users) + Debug versions of OS while running other users. (inc. di- - .38 - agnostics, etc.) + Develop network software on single machine + Run multiple OS releases + Have students do systems programming. + High reliability due to high isolation between virtual machines + High security (for same reason) + Examples: + VM/370, M44/44X (on 7044), CP-40 (on 360/40), CP-67 (on 360/67), Hitac 8400, UMMPS, VM-Ware (Mendel Rosenblum) + Implementation + For performance reasons, run non-sensitive instructions to actually execute on the bare hardware. + Trap and simulate all sensitive instructions - i.e. any which could affect the VMM or any of the other VMs. + If it isn't possible to trap all sensitive instruc- tions, then may not be possible to build VM on that machine. + Memory Mapping + Must map memory of VM to real machine. Can be done with page tables or base and bounds registers. + The VMM itself may provide a paged machine. In this case, there are two levels of page tables - one from OS to VM, and one from VM to real machine. - .39 - + In actual operation, will create composed page table to map two tables in one operation. (So built in mapping hardware can be used.) This can work for ar- bitrary numbers of levels. + Note that on a page fault, you have to figure out who handles it. + I/O + It is necessary to trap and simulate I/O. + Want to permit I/O only to valid areas for the VM + Without interfering with other VMs + The I/O code (e.g. channel programs) must be properly interpreted or translated, since they use real ad- dresses. (For this reason, self modifying channel programs are usually prohibited, as too difficult to translate.) + I/O devices are usually simulated. Each user is given virtual I/O devices ("mini-disks"), which look like real hardware devices. + The VMM keeps a bit for each VM which specifies whether the VM is in user or supervisor state. It can thus provide ap- propriate simulations of sensitive instructions. + Attempts to execute sensitive instructions in user state cause abends. In supervisor state, they are executed ap- propriately. - .40 - + Hardware Virtualizer + Idea is to have hardware which dynamically maps N levels of virtual machines into the actual hardware. An associ- ative memory is used to keep recently evaluated mappings. The hardware virtualizer would therefore avoid most software simulation. + Define maps which map from VM interface to machine below. Define U map which maps from extended machine (OS) to VM below it. U is provided by OS. The hardware virtualizer composes as many levels of f maps as neces- sary (these map virtual machine to a machine). + VM Performance + Will run slower than real machine due to simulation of sensitive instructions. + Specific performance degradations: + support of privileged instructions + maintaining the status of the virtual processor (user/supervisor) + support of paging within virtual machines + console functions + acceptance and reflection of interrupts to individual VMs. + translation of channel programs + maintenance of clocks + Ways to enhance performance - .41 - + dedicate some resources, so that they don't have to be mapped or simulated. + give certain critical VMs priority to run + run virtual=real (same as #1) + let VM instead of OS do paging. (If OS does it, it gets done twice.) + modify OS to avoid costly (slow) features + extend VMM to provide performance enhancements (but not truly a VM any more. (e.g. VM assist on 370) + extend hardware to support VM + Special performance problems + Optimization within the OS may conflict with optimization within VMM. E.g. double paging anomally, buffer paging problem of IMS, disk optimization where disk is mapped, spooling by VMM and also by OS. - .42 - Topic: Current Research in Operating Systems + Most of what we have talked about in the area of Operating Systems is not new, but goes back 20-30 years. + What are people doing currently? + In a recent Operating Systems Conf. Proceedings (Proc. 17'th ACM Symposium on Operating Systems Principles, December, 1999), the principal topics include: + Managability, Availability and Performance in a mail ser- vice. + Performance of Web Proxy Caching + Performance of a stateless, thin-client architecture + Energy aware adaptation for mobile environments. + Active Networks (customized programs are executed within the network). + Building reliable, high performance communication sys- tems. + File system usage in Windows NT. + The Elephant file system. + File system security. + Integrating segmentation and paging protection. + Resource management on shared-memory multiprocessors. + A fast capability based operating system. + A naming system for dynamic networks and mobile units. + A distributed virtual machine for networked computers. + A modular router. - .43 - + Timer support for network processing. + CPU priority scheduling. + Scheduling for latency sensitive threads. + A small, real-time microkernel. + In a recent Operating Systems Conf. Proceedings (Proc. 16'th ACM Symposium on Operating Systems Principles, October, 1997), the principal topics include: + Performance Analysis - profiling and distributed/parallel programs + OS Kernels + Caching in Computer Networks + Transactions on Networks + Security for Java + Formal Analysis of Security + Running Commodity OS on Scalable Microprocessors + Transparent Distributed Shared Memory + Scheduler for Multimedia Applications + CPU Scheduling + Scalable Distributed File System + Log Structured File System + Reducing I/O latency + File Caching and Hoarding in Mobile Systems + Update Policies for Mobile Operation + Some titles fromn 1993 SOSP: - .44 - + Distributed file systems + RAID type file systems + Synchronization and its limitations in distributed sys- tems. + Distributed system design. + Distributed programming + Using threads. + Memory Management of an object oriented language + Relation between operating system structure and memory system performance (effects on the cache of OS code). + Concurrent compacting garbage collection. + Improved IPC (interprocess communcation) + Improved fault isolation. + Audio and video in a distributed system. + Authentication + Location info for distributed systems. + From ASPLOS (Architectural Support for Programming Languages and Operating Systems), 10/94, related to OS: + Data and control transfer in distributed systems + Scheduling and page migration for multiprocessor compute servers + Synchronization algorithms for multiprocessors + Software overhead in message passing. + Software support for exception handling. + Performance monitoring. - .45 - + In summary: + Networks and the Web + Performance Issues- memory, scheduling, networks. + Mobility + Energy Management + File Systems + Protection and Security + Misc: virtual machines, kernels. + Personal View of Important Issues: + The world is becoming one large distributed computer sys- tem with file migration, process migration, load balanc- ing, distributed transparent file system, etc. + This suggests that the important issues are: + Efficient ways to write reliable OS + With high performance + file migration algorithms + load balancing + distributed transparent file system implementa- tion + wireless and mobile systems + Supporting mobility + Location and naming issues + Energy management - .46 - Topic: Real Time Systems + Real Time Systems are Systems in which there is a real time DEADLINE + Typically a mechanical system is being controlled. + E.g. assembly line, anti-balistic missile defense + Real Time System must be able to: + Meet all Deadlines (with 100% or 99+% probability) + Handle the aggregate load. + If there are N events per second, must be able to handle them, whether or not each has a deadline. + This implies: + Deadline Scheduler + Avoidance of page faults - generally must lock deadline oriented code into memory. + Avoidance of I/O operations when near deadline is pend- ing. Usually must keep necessary info in electronic memory, and/or fetch it in advance. + This does not imply no cache memory + No matter what people say.... + Better to have system that sometimes runs at 5X or 4.9X, rather than one that is guaranteed to run at 1.0X all the time.