Lecture Notes for Wednesday, 4/13/2005 by Kun Yo Chiu & Kan Shun Sit /* Announcement: Exams not ready yet, expected to be handed back on Monday. */ The mechanisms described above form the basis for tying together distributed systems. So far, though, they've only been used for loose coupling: Each machine is completely autonomous: separate ac- counting, separate file system, separate password file, etc. Can send mail between machines. Can transfer files between machines (but only with special commands). Can execute commands remotely. Can login remotely. Loose coupling like this is OK for a network with only a few machines spread all over the country, but not for a high-performance LAN where every user has a private ma- chine. What would we like in a distributed computer system? Unified, transparent file system. Unified, transparent computation - from any terminal, you can run on any machine, transparently. (You actually shouldn't care which machine you're running on.) Load Balancing, process migration, file migration. /* processes can move to where resources are available. Files would move so we get better performance. You can do this on experimental LANs but otherwise it's not reasonable because of issues like bandwidth. */ Local area networks can more or less provide that now. Wide area networks cannot provide this transparenly, due to performance problems. May be possible in future? Distributed File Systems Remote files appear to be local (except for perfor- mance). Issues: Failures - what happens when remote system crashes. /* if your machine crashed nothing you can do. but if remote machine crashes you don't really know why. */ Performance - remote is not same as local. Can do some caching. Sun's NFS (Network File System) NFS permits the mounting of a remote file system as if it were local. Therefore, by using mount com- mands, can set up transparent distributed file sys- tem. /* you got big caches on both ends. write is transmitted to server from your local copy. */ Caches file blocks, descriptors at both clients and servers. Write-through caching. When file is closed, all modified blocks are sent immediately to server disk. "Close" doesn't return until all bytes stored on disk. Consistency is weak. Polls periodically for changes to file; may use old version until polling. May have simultaneous conflicting updates. Server keeps no state about client (except hints, for performance). Each read gives enough info to do entire operation. (I.e. Read(I#, po- sition). /* but this is good enough. all operations are given enough info to do what needs to be done. e.g read a block means read just that block not the next block, this is unambiguous and repeatable. When a server crashes this allows us to pick up where we left off. Any requests can be repeated with no ill effects, so you can read or write again if something went wrong first time. */ When server crashes and restarts, can start again processing requests immediately. All requests are "idempotent" - i.e. can be re- peated with no ill effects. So if message may be lost, can resend (and possibly redo). /* Caches need to be consistent. didn't discuss this in 61b,c but something in 150,152. Cache consistency is something studied in hardware context. if you got caching you got the main copy of something and the cache copy. this is ok as long as cache is same as main copy, but if you write the local copy is different from main. We'd like everything to be consistent. Remember caching can be local in your cpu, or in a file system, or a physical disk controller too. Question: what if you have two simultaneous modificatons? Ansewr: This is why you have locking in a multi processor cpu you'd lock the bus read/write and everyone knows they can't touch it. With a distributed system this is a little more complicated. short discussion about directory protocol, every system looks in before they read/write a block and this would work. But then you bottleneck because of the directory. */ Cache Consistency Issue is that in a system with caching, there can be many copies of a given piece of data. If any of those pieces is written, it becomes inconsistent with the other pieces. Problem can occur in distributed systems (as well as CPU cache memories). Goal: distributed system (or shared file system) yields same results as unified, uniprocessor system. Caching can occur in disk controller, in SAN or NAS, in main memory disk cache, etc. Solutions: Only one cached copy at a time. Very inefficient, if file is read only or read mostly. Approaches: (a) Many read copies (b) One write copy (c) Update many write copies. Several solutions: i. "Open for write" causes all other cached copies to be deleted. ii. Write to a block deletes all other cached copies of block. ("write invali- date") ("write through" - update backing store copy also.) Need to lock block, delete other copies and then update, to avoid crossing up- dates. iii. Write to block is broadcast to all oth- er cached copies (and backing store?) ("write update") Need to lock all blocks before updating any. Optimistic approaches Don't do the locks - hope it works out. Don't do the locks, but keep old copy. Backout if necessary. Leave it to the users to worry about. (Unix, Database Systems.) Note: need to know where/how to find all of the copies. In distributed system, there is no way to do "snoopy" coherence. I.e. can't see all block and file requests. System "owning" file (usually the processor in the system where the file lives) must keep track of who has copies. If broadcast-writes is used, can give list of cached copies to any system writing the data. /* Snooping is if there is a shared bus, so every processor can watch it so they know all the accesses to memory that way they know what has been changed in caches. you can sort of do this with ethernet too.lots of approaches, we need this if we want a consistent file system. */ "False Sharing"- a block is shared, but the data in it isn't shared. I.e. process A is using subset X of the block, and process B is using subset Y of the block, but X and Y don't overlap. This problem can have a big effect on the perfor- mance of a cache consistency scheme. "Write Merge" - if there are several writes to a block, while it is in cache and before it is written back to disk, can "merge" the writes, so that only cumulative update is written back. Process Migration Want to take running process and move it to another computer. Why is this hard? Need to save and transfer state. Need to maintain all connections (to OS, to network, to other systems, to I/O devices, to file system). Connections can be forwarded. Connections can be reconnected. Obviously doesn't work if CPU architecture is different. (Object code won't run.) Very hard if OS is different - they have to be interoperable - all the same system calls, etc. /* What if other processor doesn't have the devices we need? Got to have a unified transparent file system, e.g if you have a cd then the new machine should also be able to read this cd as well. you don't have to move all state immediately but you need the state when you need it. Page faults are much worse here, and tables are weird, bad because now you might need to access disk of another computer. You could back the whole thing up and move it over and start again. It has to be the same architecture can't go from windows to a powerpc chip on a mac. Code won't run. What if network id is hard coded? If you got a running processes it's talking to rest of world, it's like redirecting your email, so if you move things need to be temporarily forwarding. Ideally, the system realizes this and won't need to forward but will go there directly. Process migration sounds great but it's hard. There is a package called mozics that does some of this in linux. Process migration has been done in linux. Question: you do migration for load balancing? Answer: Yeah typically. it's easier to move a process than a file sometimes. or also for reliability. */ Parallel Programming and Amdahl's Law N processors will never get you an N-times speedup. Part of the computation is not parallelizable. (At least the code that distributes and gathers the computation is not parallelizable.) Amdahl's Law - speedup is limited by sequential component. Parallel programming also requires communication and sharing, both of which impede performance. /* Digression: 10% of all electricity in santa clara goes to run computers. There are service centers for server farms. The power desnity is just millions of watts going in, huge. The problem is aggregated heat production, i mean 50k machines is gonna make a lot of heat. if you don't dissipate that heat the building is gonna is slag. This is why some chips are preferred because they use less power but less instructions. server farms is all about getting heat out of the building. it'd be nice if we had server farm migration right haha move it to minnesota in the winter right. A company that provides computational power and moves around to disispate heat, this might work with fiberoptic cables, maybe. you know nowadays it might be better to just get a bunch of computers than a heater for warmth. excess heat can be reused for electricity. fry egg or cook chicken with processors?! Overclocking processors, you could immerse an entire machine into a liquid. That way the heat would dissipate. Systems have strange bottlenecks, i.e dissipating heat. parallel computation, you can parallelize 90% but that doesn't mean the system runs that much faster. Nothing is 100% parallelizable. Parallel processing requires communication and impedes performance. Parallel processing is coming because we can't make them so much smaller and it's getting hotter. So what we do is to put more processors now. If you make what happens in a machine cycle smaller you do less. systems with multiple cpus require code to become parallel. Parallel programming used to be research but now it is realistic if we want greather throughput. If you want to know more about tcp/ip you should check last lecture. just go to section. */ Topic: Protection and Security /* About 2 lectures on this stuff or Karl will give half a lecture. should we take the new security course? if you are interested, prof will be talking more about this stuff. if you just want protection and security we could just put a computer in a locked room and limit access to it. */ + The purpose of a protection system is to prevent accidental or intentional misuse of a system while permitting controlled sharing. It is relatively easy to provide complete isola- tion. + Accidents: + E.g. program mistakenly overwrites the file containing the command interpreter. Nobody else can log in. Prob- lems of this kind are easy to solve (can do things to make the likelihood small). + You accidently destroy a file you'd like to keep. /* you're lying if you never accidentally deleted a file. Some accidnets you can recover some you can't. */ + Malicious abuse. /* avoid malicious abuse lots like that, e.g speed dialing ip addresses, script kiddies that try to break into systems unpatched. These are real problems. 3 things we are concerned with e.g adjusting your grades e.g can't give exam cause prof system crashed. e.g someone stole personal id. */ + Speed dialing of IP addresses + Script Kiddies + In class I'll make jokes, but this is really a serious problem. + Three types of effects we are concerned with: + Unauthorized information modification + Unauthorized denial of use. + Unauthorized information release. + The biggest complication in a general purpose remote accessed computer system is that the intruder in these definitions may be an otherwise legitimate user of the system. + Examples of problems - not solved - these are not computer problems: + Fake timesheets for paychecks + Repeat button printer to print extra paychecks + Round off amounts and put into special account. + Make up deposit slips with your account # on them. + Make up checks with your name, but some other account # on them. (Paid out of other account). /* Some stories: You can scam people, like fake time sheets, not a computer program. another story a guy who stood by a printer waiting for his check and kept hitting repeat, so he got multiple paychecks. Write a routine that computes interest, instead of rounding stuff you can always round down and whatever is left over you take it. this is what they did in office space. you get fractions of a penny but do it enough you'll be rich. Blank deposit slips to trick machine. not a computer program. Question: How do we prevent problems like this nowadays? Answer: Who says we do? Society functions reasonably because the right thing to do is in your advantage. Like in us the economy rewards talent well enough it's better to earn an honest living than being a criminal. This is opposite in the 3rd world. So this is the reason why we don't do it, smart people are better off doing the right thing rather than the bad thing. The risk/reward ratio is obvious not in the criminal's favor. you can't be too successful either though. Just so you know cuba doesn't extradite for certain crimes. */ + Functional Levels of Information Protection + Unprotected System + All or nothing system. (i.e. sharing or complete isola- tion). + The simplest type of protection system is just a user/system distinction. Someone with "system" privileges can do anything. A user can do stuff only permitted user level access. + Obviously, this is unsatisfactory - we need to differentiate between users, and given more fine grained access. + Controlled sharing. + User programmed sharing controllers - e.g. user want to put complex restrictions on use, such as time of day, or concurrence of another user. + Design principles for protection mechanisms: + 1) Economy of mechanism: keep the design as simple and small as possible. (KISS - keep it simple, stupid.) + 2) Fail safe defaults - base access decisions on permis- sion, not exclusion. If the system fails, the default is lack of access. + 3) Complete mediation: every access to every object must be checked for authority. + 4) Open design: the design should not be secret. Its effectiveness should not be impaired by knowledge of the mechanism. + 5) Separation of privilege - where feasible, a protection mechanism that requires two keys to unlock it is more robust than one that allows access to the presenter of only one. + 6) Least privilege - give no more than the required ac- cess. + 7) Least common mechanism - minimize the amount of mechanism common to more than one user and depended upon by all users. (These represent the information paths.) + 8) Psychological acceptability - human interface must be convenient! /* Encryption - if people know the mechanism and there is a flaw they can discover it quickly. You can concentrate secrecy all on the key, but it's not quite the same thing. people can break it but at least you will know about it quick. Didn't work for the minotaur, theseus myth, and the labyrinth, it relates cause the minotaur was relying on the theseus to get lost. Seperation of privelege sort of like the two keys and activating a missle silo. Least common mechanism stuff that provides security is small, fewer moving parts. stuff is used by people, people have to put up with it. if it's too inconvenient, people won't use it. I mean if you have a system with a weekly password that isn't easy to remember people won't remember how to use it. If you have a lot of security features, you don't need all of them so they become a pain. */ + There are three aspects to a protection mechanism: + Authentication: Confirm that the user is who he says he is. + Authorization determination: must figure out what the user is and isn't allowed to do. Need a simple database for this. + Access enforcement: must make sure there are no loo- pholes in the system. Even the slightest flaw in any of these areas may ruin the whole protection mechanism. /* Authentication but how do you make sure they are who they say they are? There are 3 Alan Smiths on campus just so you know, other two are in biology though. Authentication is easier cause laptop has built in finger readers, but how reliable are they? Authentication done with password but in the future it's biotmetrics, retinal scans, brain scans, fingerprints. */ + Authentication is most often done with passwords. This is a relatively weak form of protection. + A password is a secret piece of information used to es- tablish the identity of a user. + Passwords can be compromised in a number of ways: + Can be stolen (you write it down somewhere - wal- let, address book, front of terminal) + Story of someone who looked in waste baskett of guy who added new passwords. + Story of guy (on CTSS) who logged on at mid- night (when system administrator was on), ex- panded his segment and got copy of password segment of system admin. + Line can be tapped and password copied + Password can be observed as typed in. + Password can be guessed (your name, your mother's name, your project, your birthday). + Password can be repeatedly tried. (Should not be permitted). + Password file can be broken. + Counter-Actions + Passwords should not be stored in a directly-readable form. One-way transformations should be used. + Use a one-way function. Given a copy of the password file, can't invert the transformation to get back to the password. (Unix does this. Password file (/etc/passwd) is readable.) + Unix also uses "SALT" to extend the password. Idea is that salt (e.g. user ID) is concatenated to password before encryption - this prevents ef- ficient generation of file of encrypted common passwords for comparison /* In unix/linux takes plain text and turns into encrypted. there is no inversion back to plain text. this is a mapping from a->b, but you can always try all x and y by brute force to find the mapping, but otherwise there really is no way to find it. SALT is a nice trick, copy password file and brute force with common guesses. make a list of common passwords, encrypt, and match them against other ones. Like if your login was moon, you'd do moon sun. same pass but to different things. this makes breaking it by brute force less efficient. Take 10k passes encrypt them. compare with real passes. SALT is a condiment, think of it as a condiment for passes. */ + Password testing should be slow, to discourage machine based tests. + Limit the number of tests. + Passwords should be relatively long and obscure. Para- dox: short passwords are easy to crack, long passwords are easily forgotten and usually written down. + Password testing programs (using common names, words, etc.) often can break more than half of the passwords in the system. + Tell the user when he last logged in - can tell if there was an intruder. /* At MIT the system generated password for you, luckily all passes are pronouncable in english. Another useful thing is when the user logged in, that way user can tell if someone hacked them. passewords aren't exactly very hard to break, experiment wise you can break like half of them. There's gonna be a whole course in privacy and protection. */ + Note: we must protect the authorizer. The program that checks whether the password matches (or encrypts the pass- word) must be incorruptible. + (If there is physical security, must be sure guards are incorruptible.) /* Need to protect authorizer, in a computer system, you got to make sure no one rewrites the code to check passes. Like a guard at the gate. make sure no one kills he guard or bribes them. In Livermore, you need a badge and everything, the guard is supposed to check badge and you, so some guy decides this doesn't work too well. A guy came in with a monkey badge and this worked for a while, but he got fired. it's not easy being a whistleblower. */ + Another form of identification: badge or key. + Does not have to be kept secret. + Should not be forgeable or copyable. + Can be stolen, but the owner should know if it is. + Pain to carry. + Key paradox: key must be cheap to make, hard to duplicate. This means there must be some trick (i.e. secret) that has to be protected. /* Question: keycards? Answer: Not sure, but my guess is you can't just manufacture one that works cause it's too complex, so realistically what are the chances i can get some blanks and then program them. Or you could buy them. forging things from scratch is really hard, but you could do it if you were the KGB, but your average crook is doubtful to be in touch with the KGB. There is a problem with credit card blanks being stolen, and it isn't too hard to imprint them. */