+-------------+ |ANNOUNCEMENTS| +-------------+ + The final midterm is on May 9th, 7 PM in 10 Evans. + The TAs and Professor Smith have asked the readers to finish all reading and grading by next week (the week of May 1st). Hopefully, everything will be posted. The last chance to complain about grades will be the day of the midterm. + For those who do not yet have their midterms, Prof. Smith has them. + Today, Prof. Smith will lecture for about half the time; Karl will lecture for the other half. +---------------+ |SMITH’S LECTURE| +---------------+ ---Protection and Security (cont.)--- Federal Data Encryption Standard (DES) -------------------------------------- + DES was developed in the 1960s by the National Security Agency (NSA). + By the 1980s, there was considerable concern that it was inadequate. + DES is a block cipher that takes 64 bits in and 64 bits out using a 56-bit key. + There is reason to believe that DES was designed to be hard but not impossible to break so that, should the NSA be interested in DES-encrypted communications, it could decrypt them. - In the 1960s and 1970s, the NSA could indeed break DES; of course, it is much simpler today. - There are chips available on the market that can do the job. + The 56-bit key is quite short; however, unless the communication is of very high value, DES should be adequate. ! Interesting note: There is a something called the International Arms Trade Regulation (ITAR) that is intended to restrict the importation of arms; the federal government, however, has decided that encryption and encryption technology fall under the same category. - This has interesting consequences: the developer of Pretty Good Privacy (PGP) was prosecuted--the case was eventually dropped. + One of the things that the US government would like is for everyone to use inadequate encryption. + DES takes the 64-bit block and breaks it down into smaller 4-bit blocks. A chip then takes these blocks, permutes them and performs other operations on them, based on the key. - Smith has heard that, by placing two DES chips one next to the other (two-level DES), sufficiently strong encryption can be achieved ! Story: In World War II, for battlefield encryption, Navajo Indians were hired to be codetalkers: they would speak in Navajo. - Apparently, the Navajo code was never broken. - One of the benefits of this was that there were a limited number of Navajo speakers, meaning that the Japanese could not easily infiltrate the system. ! Aside: New Guinea has about 800 languages and five language families, meaning that those from one part of New Guinea could not understand those from another. ! Story: About 40 years ago, some computer scientists thought that they could do inter-language translation using word-for-word substitution; they failed miserably. - The problem was that languages differ not only in words but also in grammar, structure, and numerous other fashions. Public Key Encryption --------------------- + Public key encryption has been quite popular for the 20 or so years it's been around. + As opposed to symmetric encryption as described in the last lecture, public key encryption does not require two users to exchange a key. + In public key encryption, there are two keys, a public key and a private key, which are inverses of one another. + The public key can be given freely to anyone whereas the private key is known only to the owner of the key pair. + If Alice wishes to send a message to Bob, she encrypts it using his public key and Bob decrypts it with his private key. ! Details: Encoding and decoding are highly mathematical: - First, a few values must be defined or calculated: + Let both p and q be large primes. + Let n be p * q. + Let d be a large number that is relatively prime to (p - 1) * (q - 1). + Let e be a number such that (e * d) mod ((p - 1) * (q - 1)) = 1. - Of these values, n and e are public while d is private. - To encode a message m into C we set C = (m ^ e) mod n. - To decode ciphertext C into m, we set m = (C ^ d) mod n. - From number theory, it can be shown that encryption and decryption are inverses of one another. - The reason that public key encryption is safe is that factoring is hard: although n is known, p and q are not. + As long as factoring remains hard, this system is quite secure. ! Aside: There was a problem known as the knapsack problem which was once considered exponentially hard but was later solved relatively easily. - Public key encryption also allows for authenticity verification: since the public key and the private key are inverses, if Bob encrypts a message using his private key, anyone can decrypt it using his public key; if the message is nonsensical, either Bob was not the sender or the message has been tampered with. + If Bob wants only Alice to be able to ensure authenticity, he simply first encrypts using his private key and then using Alice's public key. + Public key encryption can be used for safe mail, whereby Alice sends Bob a message encrypted using Bob's public key; Bob can then decrypt this message using his private key. - Authentication can also be achieved using the previously described authenticity verification method. + Public key generally uses values of n that are 1024 bits long; there are methods of generating large primes fairly quickly. One-Way Encryption ------------------ + The idea of one-way encryption is to take a message and convert it into ciphertext. The ciphertext then cannot be converted back into the original message. + One-way encryption is frequently used for password files: a user's password is hashed into a string, which is then stored. + It is difficult to go from a hash back to the original message but it is not impossible. - Given the hash algorithm, it is always possible to build a mapping from messages to ciphertext; however, such a mapping would be huge and unwieldy. Encryption Reliability ---------------------- + How can one ensure that an encryption algorithm is safe? + In the past, encryption algorithms themselves would oftentimes be kept secret, thus making codebreaking much more difficult. + Current wisdom, however, has it that one should publish his algorithm in its entirety (minus the key, of course) and allow security experts to try to break it. + One of the major proponents of the older method is the motion picture industry, which includes a proprietary encryption scheme in all DVDs (Content Scrambling System or CSS); this was readily broken by a Norwegian student in only a couple of hundreds of lines of code. CLIPPER Chip ------------ + The CLIPPER Chip is intended as a substitute for DES. + Its algorithm is classified, meaning that the government, should it wish to, could readily decrypt any individual's messages. + Like DES, it takes 64-bit blocks; unlike DES, it uses 80-bit keys (compared to 56 for DES) and 32 rounds of scrambling (compared to 16). + There are three important values in a CLIPPER Chip: - F is the 80-bit key shared by all CLIPPER Chips. - N is a 30-bit serial number unique to each chip. - U is a secret decryption key unique to each chip. + Every session using CLIPPER follows a specific format: - First, a session key K is negotiated somehow. - For the purposes below, let E(M;K) denote the encryption E of message M using key K; all messages other than the "law enforcement block" (described below) are encrypted with key K. - Second, a "law enforcement block" is sent containing E(E(K;U), N; F). + Because key F is known by all CLIPPER Chips, it can be used to extract E(K;U), N. + Federal agencies can use N to uniquely find U and thus decrypt K. + K can then be used by the government to decrypt messages. + The trick here is that U is actually the XOR of U1 and U2, which are held by different federal agencies. Therefore, the only way to get U is through a court-ordered wiretap. + Because the government can readily decrypt any message, not many groups are enamored by this encryption system; the only people who are likely to use it are defense contractors... but only because they have to. +--------------+ |KARL'S LECTURE| +--------------+ ---Security--- Adversaries ----------- + Suppose that Alice and Bob are trying to communicate with one another but Mallet is interested in their conversation. + There are two possible ways in which an adversary can compromise communication: - The adversary can listen to a message without being able to alter it (see Fig. 1a). This is akin to a telephone bug. - The adversary can intercept all messages and then alter them as he wishes (see Fig. 1b). In this scenario, the adversary has complete control over communications. +---------------------------+ +--------------------------------+ | | | | | Alice <-----------> Bob | | | | | | | Alice <---> Mallet <---> Bob | | v | | | | Mallet | | | +---------------------------+ +--------------------------------+ a) Listener b) Interceptor Figure 1. Two types of adversarial intervention Trust ----- + In Figure 1, there are two examples in which Alice and Bob cannot trust the network because Mallet is part of the network. + Trust is akin to dependence: if a user trusts something, he depends on it to function correctly and securely. + Therefore, it is necessary to minimize trust and thus minimize dependence. + Accepting or assigning trust is usually divided into two parts: - The first is authentication--acknowledgement that what is received is from the correct individual. - The second is authorization--assigning to a sender the rights allowed to him. Hash Functions -------------- + As mentioned earlier, a hash function is a one-way function that takes a message and converts it into a hash; there is no easy way to recover the original message from the hash. + One of the weaknesses of a hash function when applied to passwords is a dictionary attack. - Users have a tendency to choose easy-to-remember passwords, often in the form of words. - Therefore, it is possible to generate the hash values for a list of common words, creating a mapping from hash values to words; this mapping can then be used to discover some passwords. - Many systems therefore use what is known as a salt: if a user's password is "1234", the system, instead of hashing "1234", hashes "xx1234"--where "xx" is some value. + This makes generated dictionaries useless; however, it is still possible to regenerate hash values for each common word added to the salt... but this is slow. + One popular hash function is MD5; although there have been security vulnerabilities discovered to be inherent in it, it is still pretty good. Authenticity ------------ + If Alice sends Bob a message, Bob may wish to ensure that the message in fact came from Alice. + More specifically, Bob wants to ensure that the message: 1) came from Alice. 2) was not altered by Mallet. 3) was not replayed by Mallet (e.g., if Alice's message is "scramble the planes", Bob wants to make sure that Mallet did not duplicate the message and send it again two days later). 4) was not re-ordered by Mallet. + The solution to the first two problems is called the Message Authentication Code (MAC). - The MAC is a hash function of both the message sent and of a secret key attached to a message. - If Mallet generates or alters a message, he cannot generate the MAC because he does not have the secret key. + The solution to the last two problems is to add a sequence number and to send a "finished" message upon the completion of transmission. - This means that if a message is sent out of order or resent, Bob will notice that the sequence number is wrong and discard it. + With these solutions implemented, Mallet can still block communication or eavesdrop on messages between Alice and Bob but cannot corrupt the integrity of the messages. Confidentiality --------------- + If Alice sends Bob a message, neither party wants someone else overhearing the conversation. + This means that, ideally, message authentication would be unnecessary. + The solution is naturally encryption, as described by Smith. Secure Channel -------------- + If both authenticity and confidentiality are implemented, what results is a secure channel. + On a secure channel, Mallet should not be able to guess the secret key associated with the MAC even if he overhears many messages and their MACs. Symmetric vs. Asymmetric Encryption ----------------------------------- + There are two main kinds of encryption, as already discussed by Smith: - Symmetric encryption uses one key which both encrypts and decrypts a message; because of this, users must keep the key secret. - Asymmetric encryption uses one key for encryption and another key for decryption; this way, the key used for encryption (the public key) can be freely distributed and only the key used for decryption (the private key) needs to be kept secret. + Some of the more common symmetric ciphers in use today are DES, AES, Blowfish, CAST, and ARCFOUR. + Some of the more common asymmetric ciphers in use today are RSA and DSA. + Symmetric ciphers are still commonly used because they generally provide better security--for a key of a given bitlength, a symmetric cipher will perform better than an asymmetric cipher. + Asymmetric ciphers, however, are much more convenient since they do not require some secret means of distributing the secret key. + Also, symmetric ciphers are generally much faster than asymmetric ciphers. + As mentioned earlier, asymmetric ciphers rely on the difficulty of factoring large numbers to deter breaking. + In practice, asymmetric ciphers are used to negotiate a symmetric key, which is then used for the remainder of the conversation. Digital Signatures ------------------ + As mentioned earlier, public key encryption can be used to verify authenticity (see SMITH'S NOTES, section on Public Key Encryption). + In practice, if Alice sends a message to Bob using public key encryption, she first encrypts it using his public key and then using her private key, ensuring the authenticity of every message. Certificates ------------ + Using public key encryption requires that users know other users' public keys. + Because of the number of public key encryption users, it is impossible to know all of them. + Suppose that Bob wants to send Alice a message but does not know her public key. - One way to find it is through an intermediary, Charlie, who knows Alice's key. - If Bob trusts Charlie, he can ask Charlie for Alice's key; Charlie can send an encrypted message containing Alice's public key. - This works on a small scale, where there are only a handful of users who all trust each other but cannot work on a userbase the size of the internet. + The solution generally used is certificates. - Suppose that eBay wants to ensure a user of its authenticity. - eBay first asks a reputable code signing company (e.g., Verisign) to verify it. - Verisign checks eBay's credentials and encrypts, using Verisign's private key, a message that says "eBay's public key is xxx". - This message is given to eBay, who then gives it to users whenever they visit the site. - Users' browsers know Verisign's public key and can thus decrypt the message and discover eBay's public key - Therefore, so long as a user trusts Verisign and knows its public key, he can find the public key of any website that uses Verisign. ---Network Security--- TCP Source Spoofing Attack -------------------------- + In TCP, the initiation of a connection requires a three-way handshake: 1) host A sends a SYN to host B. 2) host B sends a SYNACK to host A. 3) host A sends an ACK to host B. + With every packet sent over TCP, there is included a sequence number. - Early on, the sequence numbers were assigned sequentially so that a user could more or less guess the current sequence number of a host given its earlier ones. - Upon establishing a connection, an initial sequence number (ISN) is included with the SYNACK, which is used to synchronize messages. + A malicious host (C) can attempt to pretend to be host B: 1) host C connects to host B and finds B's ISN from the SYNACK packet. 2) host C somehow disables host B. 3) host C pretends to be host B and sends a SYN to host A. 4) host A responds with a SYNACK to host B, which gets dropped because B is nonfunctional. 5) host C responds with ACK using an ISN predicted from earlier. - if C guesses the ISN correctly, it has established a connection with A and can thus send data to A; if not, it can guess again. + The modern solution to this problem is to randomly assign ISNs; this makes it difficult for a malicious host to guess it. TCP SYN Flood Attack -------------------- + In TCP, the initiation of a connection requires a three-way handshake: 1) host A sends a SYN to host B. 2) host B sends a SYNACK to host A. 3) host A sends an ACK to host B. + If step 3 does not occur, then host B will wait for it until it expires (approximately one minute). + TCP also has a limit to the maximum number of pending connections (six); if any more connection attempts occur, they are dropped. + Therefore, it is possible to disable a machine by flooding it with SYN packets: 1) host A1 through A6 all send SYNs to host B. 2) host B responds to all the SYNs. Now, host B cannot accept more connections because it is waiting for SYNACKs from all six hosts. + Under the old system, a machine could be disabled by simply sending six SYNs every minute. + The solution is to not add a connection to the limit until after receiving the packet after the SYNACK in step 2 above; after receiving a SYN, most operating systems create a SYN cookie, of which there can be many. Worms ----- + In the recent case of Code Red, the growth of the worm was highly exponential until it reached saturation (see Fig. 2). +-----------------------------------------------------------+ | | | | | | 250,000 + | | N | | | u | | | m | * * | | b 200,000 + * * * | | e | * | | r | | | | * | | o 150,000 + | | f | | | | * | | s | | | c 100,000 + | | a | | | n | * | | s | | | / 50,000 + * | | h | * | | r | * | | | * * | | 0 +-*-*-*-*-*-*---+---+---+---+---+---+---+-- | | 0 2 4 6 8 1 1 1 1 1 2 | | 0 2 4 6 8 0 | | | | Hour of the Day | | | +-----------------------------------------------------------+ Figure 2. Scans caused by Code Red I + Worms have a way of interacting with each other that creates a kind of ecosystem. - When Code Red II was released, the effectiveness of Code Red I was slashed dramatically. - When Code Red II was practically eliminated, Code Red I returned to its former effectiveness. + The effectiveness of a worm can be dramatic. - Given the pervasiveness of the internet, it is possible for a worm to effectively infect the entire internet in an hour. - At the same time, because of the simplicity of worm-writing, many are written by amateurs. ! Aside: Worms are different from viruses in that worms can propagate themselves independently whereas viruses require some other program to initiate and propagate. + Traditionally, worms were detected by their signature, a section of code common to all copies of the worm. - A human had to generate the signature by hand after analyzing the code. - A signature scanner would then scan executables to see if their contents were malicious. - The biggest drawback to this approach was that it was painfully slow, oftentimes requiring an entire day for humans to generate a signature. + Modern techniques include the automatic generation of signatures--automated replacement of the hand-generated technique--and machine learning, whereby a system learns through experience what is a worm and what isn't. Format String Vulnerability --------------------------- + A common programming error is to write: printf(string); instead of: printf("%s", string); + This can cause problems because string might contain format strings such as "%s". + If string is user-supplied, it can enable a user to print data from memory by supplying an illegal string (e.g., if string is "%x%x%x", a user can print data from the stack). + Karl is currently going through all of Debian Linux using an automated process to find whether or not there are format string vulnerabilities. Machine Learning on Novel Worms ------------------------------- + The old method of hand-generating signatures is oftentimes too slow in detecting worms; also, signatures can only catch worms that have already been found. + By using machine learning, worms can be detected sooner and more readily. + First, a system mines emails for their features such as the number of recipients or the types of attachments. + Second, the system uses these characteristics to determine whether or not an email is uncharacteristic. + Third, a parametric classifier filters the uncharacteristic emails to reduce the number of false positives. The parametric classifier is constantly retrained based on new data. + This method allows the detection of previously unseen worms and can block senders who forward such worms. For More Information... ----------------------- + CS194, which covers the important features of computer security, will be offered next semester. For more information, see http://www.cs161.org. + Cuckoo's Egg by Cliff Stoll details a man's pursuit of computer hackers. +---------------+ |SMITH’S LECTURE| +---------------+ ---Virtual Machines--- + The idea of a virtual machine is to simulate an actual machine except in software. - Naturally, this can cause a noticeable decline in speed. + One of the solutions to the speed problem is to execute most instructions directly on the underlying machine. + The Virtual Machine Monitor (VMM) provides the low-level interface that provides the simulated bare machine. + On a system without virtual machines, user programs are run on top of a privileged software nucleus, which runs on top of the bare machine (see Fig. 3a). + On a system with virtual machine, the VMM runs on top of the bare machine, virtual machine run on top of the VMM, and so forth (see Fig. 3b). +----------------------------------+ +----------------------------------+ | | | | | +------------+ | | +------------+ | | |Bare Machine| | | |Bare Machine| | | +------------+ | | +------------+ | | | Privileged | | | | Virtual | | | | Software | | | | Machine | | | | Nucleus | | | | Monitor | | | +------------+ | | +------------+ | | | | | | | | | | | | | | | | | | / \ | | / \ | | / \ | | / \ | | / \ | | / \ | | / \ | | / \ | | / \ | | / \ | | / \ | | / \ | | +---------+ +---------+ | | +----------+ +----------+ | | |Extended | |Extended | | | | Virtual | | Virtual | | | | Machine | | Machine | | | | Machine | | Machine | | | |Interface| |Interface| | | |Interface | |Interface | | | +---------+ +---------+ | | +----------+ +----------+ | | | User | | User | | | |Privileged| |Privileged| | | |Programs | |Programs | | | | Software | | Software | | | +---------+ +---------+ | | | Nucleus | | Nucleus | | | | | +----------+ +----------+ | +----------------------------------+ | | | | a) System without VMs | ... ... | | | +----------------------------------+ b) System with VMs Figure 3. Systems with and without Virtual Machines + Semantically, virtual machines and emulators are often confused; for the sake of discussion, virtual machines are run on top of machines that are the same while emulators run on top of machines that are different.