cs162 Lecture April 9th. -------------------------------------------------------------------------------- Professor Alan Smith -------------------------------------------------------------------------------- By: Aleksey Trofimov -------------------------------------------------------------------------------- Midterm will cover up to networking. There will be nothing on protection. Nothing on what will be covered in class in Wednesday will be on midterm. Once the memory isn't shared, then it's a distributed system. With vanilla networking, there is a bunch of machines that can send files and text to each other, and could remotely run commands. In distributed system, we want: We would like a unified, transparent file system. Unified transparent computation - from any terminal you can run on any machine transparently. Ideally don't need to know which machine you're running on. Load balancing, process migration (from highly loaded, to lightly loaded) All that can be done right now with Local Area Networks. Wide are networks cannot provide transparency. Distributed File System A couple of issues: Failures. When something brakes, don't know what it is right away. There are a lot of things that can fail. A distributed system can fail in power somewhere, contact due to floods, disasters and etc. Performance is an issue... due to latency. Sun's NFS (Network File System) NFS permits mounting remote file systems as if if were local. To get decent performance, needs lots of caches. Cashes on every level. It even write to caching, in a funny way. Specifically when a file is closed, the data is transferred to the remote disk, and close() only returns when the data has been written. File blocks are cached on both clients and servers. All commands are absolute.. No "READ NEXT".. Nice features, when server crashes, can start again requests immediately. All requests are idempotent, can be repeated with no ill effects. Cache consistency: Developed for CPU caching, but applies to distributed systems as well. Issue is that there ay be many copies of any given piece of data. If any of those is written, then it becomes inconsistent with others. Goal: distributed system yields same results as unified, uniprocessor system. Research in this are sucks because getting one wrong answer isn't better than many wrong answers (opinion of Mr. Smith). Solution: Only one cached copy at a time. This is good, but very inefficient, if the file is say read only. Approaches Many read copies One write copy Update any write copies Several Solutions: i. open for write causes all other cached copies to be deleted ii. write to a block deletes all other cached copies of block. write through - update backing store copy at the same time. need to lock block, delete other copies and then update to avoid cross update. iii. Write to block is broadcast to all other cached copies. Need to lock all blocks before updating any. Optimistic Approaches: Don't do locks, hope it works out Don't do locks, but keep old copy, back out if necessary Leave it to the users to worry about NOTE: need to know where and how to find all of the copies In distributed systems there is no way to do "snoopy" coherence. note: coherency == consistency (French misuse of the word) Can't see all blocks file requests. System owning file must keep track of who has copies. If broadcast-write is used, can give list of cached copies to any system writing the data. File Sharing - a block is shared but the data isn't shared. Process A using subset X of the block and process B using subset Y of the block, but X and Y don't overlap. This is also true for files. Write Merge - if there are several writes to a block, while it is in cache and before it is written back to disk, can merge the writes, so that only cumulative update is written back. Add the writes in order, and then actually write once all are collected. Process migration - want to take running process and move it to another computer It's hard to do because it won't know where the open files are.. If the files are open, then open file descriptors are broken. Hard to follow. Communications are hard, because never know if the other computer will be there once the process is sent out. Need to safe transfer state Ned o maintain all connections connections can be forwarded connections can be reconnected Doesn't work if CPU architecture is different. Very hard if OS is different. They have to have the same system calls. Parallel programming and Amdahl's Law N processors will never get you an N-times speedup. part of the computation is not parallelizable. Amdahl's Law - speedup is limited by sequential component. Parallel programming also requires communication and sharing, both of which impede performance. * this is hard because a lot of the things aren't parallelizable. Most user code is serial, needs to be computed piece by piece. NETWORKING AND PARALLEL SYTEMS IS DONE... -------------------------------------------------------------------------------- PROTECTION AND SECURITY -------------------------------------------------------------------------------- The purpose of protection system is to prevent accidental or intentional misuse of a system while permitting controlled sharing. It is relatively easy to provide complete isolation. Accidents: Program mistakenly overwrites the file containing command interpreter. Malicious Abuse Speed dialing IP addresses Script Kiddies (high school kiddies...) don't know what they're doing. In class I'll make jokes, but this is really a serious problems. Three types of effects we are concerned with: Unauthorized information modification Unauthorized denial of use Unauthorized information release The biggest complication in a general purpose remote accessed computer system is that the intruder is these definitions may be an otherwise legitimate user of the system. Examples of problems - not solved - these are no computer problems: Fake timesheets for paychecks. Repeat button printer to print extra paychecks Round off amounts and put into special account Make up deposit slips with your account on them. Functional Levels of Information Protections Unprotected System All or nothing system. Everyone sharing or complete isolation. bad because want control sharing. Controlled Sharing. This is what is there today. User programmed sharing controllers - user want to put complex restrictions on use, such as time of day or concurrence of another user. Design principles of protection mechanisms: Economy of mechanisms (KISS - keep it simple, stupid). Fail safe defaults, base access decisions on permission, not exclusion. Complete mediation: ever access should be checked ideally. Open design - design should not be a secret, its effectiveness shouldn't be compromised by that. Separation of privilege - where feasible, a protection mechanism that requires two keys to unlock it is more robust than one that allows access to the presenter of only one Least privilege - give no extra permissions. Least common mechanism - minimize the amount of code that a user is depended upon. Psychological acceptability - human interface must be convenient. There are three aspects to protected mechanism: Authentication: Confirm that the user is who he says he is. Authorization determination: must figure out what the user is and isn't allowed to do. Access enforcement: must make sure there ar eno loop holes int he system. Even the slightest flaw in any of these areas may ruin the whole protection mechanism. Authentication is most commonly done with passwords. A password is a secret piece of information used to establish the identity of a user. Passwords suck because: can be stolen line can be tapped, and password logged. password can be observed password can be guessed password file can be broken (decrypted or w/e) Counter-Actions: passwords shouldn't be written down in a direct way. The encrypted file has something concatenated to it before encryption Password testing should be slow. Passwords should be relatively long and obscure. paradox: short passwords easy to brake, long ones are usally written down password testing programs often can break more than half of passwords Tell the user when the user last logged in. Note: must protect the authorizer. The program that checks whether the password matches just be incorruptible. Another form of identification: badge of key Does not have to be kept secret should not be forgeable or copiable can be stolen, but the owner should know if it is. pain to carry Key paradox: key must be cheap to make, but hard to duplicate This means there must be some trick that has to be protected Once identification is complete, the system must be sure to protect the identity since other parts of the system will rely on it. That is the system must be sure that once a given process is accessed with a given user that the correspondence is retained. Don't want to be able to forge mail.