CS 162 Lecture Notes Topic: Networks and Communication Protocols + Two trends: + Lots of small machines. + Lots of computers everywhere, and a need to communicate. + Problem: communication and cooperation are difficult. + How do people on the same project share files? + How does new software get distributed to all users? + How is electronic mail handled? + Solution: tie machines together with networks, develop mes- sage protocols that allow communication and cooperation again. + Goal: ideally, we would like all computers to look like one very large, unified system. We could share files, communi- cate, etc. as if it were a timesharing system. + But we will always be able to tell the difference, due to performance. + Wide Area Networks - networks that connect sites that are geographically apart. + Local area networks - LANs: Developed mid-70's to hook to- gether personal computers. Most popular interconnection for LANs is Ethernet. LANs are used very differently than wide- area networks. - .28 - + Examples of networks: + ARPAnet: (Defense Advanced Research Projects Agency) - 1st and most famous network, developed early 70's but still in use. Connected together large timesharing sys- tems all over the country using leased phone lines. Pro- vided mail, file transfer, remote login. + ARPANET used IMPs (Interface Message Processors) as routers and TIPs (Terminal Interface Processors) to connect from a terminal. + Usenet: Developed late 70's, early 80's. Unix systems phone each other up to send mail and transfer files. + CSnet: developed to be less expensive clone of Arpanet, and tie together CS departments. + BITNET: ties together mostly sites using IBM equipment, including a lot of physics laboratories. + VNET - IBM's internal corporate network. Has highly secure gateways to connect to CSnet. + DECNET - DEC's network system. Name of product, and also refers to corporate communication system. + Misc. commercial nets, such as America Online, Prodigy, Compuserve. + Internetworks: mechanisms for tying together many exist- ing networks, such as ARPAnet, Usenet, and LANs. + The Internetwork is the combination of most of the above. They are now widely interconnected. - .29 - + Network Hardware + LAN + Usually ethernet, which uses either shared cable, or wires from each machine to a hub or switch. + WAN + Point-to-point links (used by most early networks). Examples are leased phone lines (50k bits, DARPAnet), RS232 connections, T1 and T3 lines, regular phone calls, satellite links, etc. + Network Topologies: + Fully connected: every site can talk directly to any other site (e.g. Usenet). + Partially connected: star and ring are most popular. Intermediate nodes must forward messages. + Multi-access bus / broadcast - (used by most LANs to- day). A single cable or group of cables connects many machines together. Best example is Ethernet (one wire). Alternative is radio broadcast. + Network Performance Parameters + Networks are usually characterized in terms of two perfor- mance parameters: + Latency: the minimum time to get the minimum amount of information between two sites. + Note the difference between transmission latency, which is time for a given bit to get from one end to - .30 - the other after the connection is set up and + set-up latency - which is time to get the first bit there. + Bandwidth: once information is flowing, how many bits per second can be transmitted (i.e. the marginal cost per bit). + Note also cost. + Protocols: + These are the key to networks. A protocol is just an agreement between the parties on the network about how information will be transmitted between them and what the information format is. There are many different proto- cols to do different things (e.g. mail, file transfer, remote login). Typically, protocols are built up in layers. Section 15.6 of the book lists the 7 ISO proto- col layers. + Hierarchical protocols - relate layers in a given system. Can be network system, operating system, etc. + Peer to Peer Protocols - relate the same layer of different systems. Rely on lower layers to actually communicate across machines. + ISO Protocols + 1. Lowest protocol layer: physical layer. Determines the electrical mechanisms for transmitting bits: vol- - .31 - tages, delays, currents, etc. + 2. Data Link protocol layer: how to get packets between two directly-connected components. Includes error detec- tion and recovery from physical layer. + 3. Network layer - Responsible for providing connections (to nodes that are not directly connected) and routing packets. Takes care of addresses. (Takes care of route changes due to changing loads.) + 4. Transport layer - low level access to network. Breaks messages into packets, keeps packets in order, flow con- trol, physical address generation. (Takes care of re- transmission of lost or destroyed packets.) + 5. Session layer - process to process protocols. + 6. Presentation Layer - resolves differences between sites in formats (e.g. character types, number represen- tation, full/half duplex, etc.) + 7. Application layer - interacts with users. Supports electronic mail, distributed data bases, etc. + Wide area networks are usually built up of Local Area Net- works (LANs), which are interconnected. Local area networks are usually some type of broadcast network. + Broadcast networks: single shared communication medium, no central controller to allocate access to it. + Simplest scheme is the Aloha mechanism: just broadcast blindly, use recovery protocols if packet doesn't get - .32 - through. This system has stability problems: can't get more than 18% utilization of channel (1/2e), and system completely falls apart under heavy loads. + Aloha system uses satellite - no choice. Can't listen - delays too long (1/4 second). + Can be improved with "slotted aloha" - messages occu- py slots. (1/e) - doubles bandwidth. + Ethernet (using a physical coax cable) adds two things. + First is carrier sense: listen before broadcasting, defer until channel is clear, then broadcast. + Also listen while broadcasting. Collision can still happen if two stations start up at exactly the same time. If collision detected, jam network so that everyone will know about collision (don't waste time transmitting junk). Then wait a ``random'' interval, retry. If repeated collisions, wait longer and longer intervals. + This is called CSMA/CD (carrier sense multiple ac- cess, with collision detection). + Ethernet Frame: destination address (6 bytes), source address (6 bytes), type (2 bytes), data (46-1500 bytes), frame check sequence (4 bytes). + Problem with Basic (original) Ethernet: + Reliability: if any station jams network, nobody can do anything, can't even figure out who's do- ing it. - .33 - + Fairness: there's no guarantee against starva- tion. People with real-time needs don't like this. + Bandwidth limited to cable (10 mbits) + Original ethernet limited to site which can be connected by cable (4000 feet). + Longer connections with switch + Security - relatively easy to listen to all traffic, and/or tap cable. + More Recent Ethernet Designs: + Use Switch to route, rather than shared cable + Rates of 10Mbit, 100Mbit, 1Gbit/sec. 10G under development. + Wireless ethernet (801.11a,b,g) + Continues to use most of ethernet protocol- frame format, timeouts, collision detect, software stuff. + Ring networks: - these are a type of broadcast network. An additional protocol built on top of a ring-structured set of point-to-point links. + Normally, an electronic token (special packet) circulates at high speed around the ring. If a station doesn't have anything to broadcast, it just retransmits everything it receives. + When ready to broadcast, a station waits until the token passes by. Instead of retransmitting the token, send - .34 - packet instead. + When packet has been transmitted, put token back on ring for next station to use. + Packet loops all the way around, gets swallowed by sender when it comes back again (recognize self as destination). + Problems with ring system: + If any station dies, token can't circulate so ring dies. + If token is missing, system dies. + Starvation is possible. + If a second token is created, system can get messed up. + We can use Ethernet or Ring Network for local net. Need some way of constructing (structuring) a wide area net. I.e. build an "internetwork". + Three methods for link between two machines: + Circuit switching - like telephone system - you have circuit between the source and destination machines. + Packet switching - communications are broken into packets and sent piece by piece. + Note that packet switching can be used to build a virtual circuit - looks like circuit, but actually packets on a shared medium. + Message switching - a virtual circuit exists for long enough to complete a message, and then the circuit is - .35 - dropped. Or can use physical link. + Getting stuff where you want it. + Packets must be forwarded from machine to machine until they reach the destination. Machines that forward between networks are called gateways. Problem is how to get stuff where you want it. + Names vs. addresses vs. routes: + Name: a symbolic term for something: ``Robert'', or ``ucbcory''. Good for people to remember. + Address: where the thing is: in an internetwork situa- tion, usually consists of the number of the network, the number of the site on the network, the id of the host at the site, and sometimes a more specific host (e.g. a workstation). E.g. jones@chaos.netnode.berkeley.edu + Route: directions for how to get there from here (a se- quence of hosts and links to pass through to reach the destination). + Sometimes the sender has to provide the route, e.g. in UUCP: hplabs!hp-pcd!hpcvc0!cliff. All each machine has to be able to do is remember its neigh- bors and forward messages. This is clumsy for users. + It's better if the hosts of the internetwork can figure out the routing stuff for themselves. This involves a special protocol between the hosts to build routing tables. E.g. in the Internet, hosts send messages to - .36 - nearest neighbors, build up tables of most direct paths from each host to each other host (fewest hops). + Difficulty with routing tables is that they get to be very large. + Note that routes can change dynamically. There can be more than one way to get from A to B. Note the problem if instability in routing if changed for per- formance reasons. + In LANs, only gateways have to worry about routing: all the other hosts just ship packets to gateway un- less for host on local net. + Communications Problems: + Packets can get lost: + Transmission errors. + Address is corrupted, and packet circulates for- ever. + Contents of packet are corrupted. + A host has all its packet buffers full so it has no place to put another incoming packet. + Can happen at intermediate host if packets are arriving on a fast network but have to be for- warded onto a much slower network. This is called network congestion. + Can happen at destination if user process can't work fast enough to process all the packets as they arrive. - .37 - + Receiver is down, and sender sends anyway. + Packets can arrive out of order: if some hosts suddenly go down, or if routing tables change, packets might wander off into the network and come back much later. Most protocols include a time-to-live mechanism: after a certain time, packets are killed so that they don't wander endlessly. + Datagram protocols: used to deliver individual packets; the packets are not guaranteed to get through or to arrive in any particular order. This is useful for some applications, but not very many. + Most applications would like guarantees about delivery and order. + This is called a connection, and the protocols to imple- ment it are called virtual circuit or transport proto- cols. + To do this, the sender and receiver must remember state about what has been happening. + Simple acknowledgement-based protocol: + Store a serial number in each packet. Sender as- signs serial numbers, increments for each packet. + Sender sends one or more packets. + Receiver sends an ack acknowledgement packet for each packet or group of packets. + Sender waits for acknowledgement before sending - .38 - next (group of) packets (must also save old pack- ets!). + If sender doesn't receive acknowledgement within a reasonable time, it assumes that the packet got lost and retransmits it. + Retransmission could result in receiver getting two packets with same serial number: it checks serial numbers and throws away duplicates and out-of-order packets. + Sender and receiver must negotiate about how far ahead the sender can send: otherwise the receiver might run out of buffer space and have to discard packets. This is called the flow control problem. + No matter what the virtual circuit mechanism, setting up the connection is complex and time-consuming. It's tricky to get two hosts to agree to communicate with each other and get their state initialized correctly. + TCP/IP + TCP/IP is collection of network protocols making up the Internet Protocol Suite. + History: + 1969- Arpanet with 4 nodes. (SDC, UCSB, UCLA, SRI) + 1972- Arpanet Demo (50 hosts) + mid-1970s- TCP developed, running on Unix (DEC PDP- 11) - .39 - + early 80s- Berkeley Unix. Runs TCP. + 1983- Arpanet converts to TCP/IP. In use by Sun. + ISO Levels: + Level 3 - Network Layer: + IP - Internet Protocol - provides host-to-host datagram delivery. + Provides packet routing, will insulate higher levels from network specific characteristics (e.g. packet size). + Fields of IP packet header include: version, header length, total length, ID (same for all fragments of datagram), time-to-live, check- sum, source address, destination address. + IP address 32 bits. (newer version longer). Broken into four 8-bit segments. Addresses allocated in blocks. + ICMP - Internet Control Message Protocol - used by gateways and hosts to approse other hosts of conditions related to their IP services. (e.g. routing, congestion) + ARP - address resolution protocol - Maps an IP address to an associated ethernet address. (32 bits -> 48 bits). + RARP - Reverse ARP- Maps an ethernet address to an associated IP address. + Level 4- Transport Layer: - .40 - + TCP - Transmission Control Protocol: connection oriented, reliable, byte-stream protocol. + TCP packet header includes: source port (identifies process or service in sender), destination port, sequence number (32 bits), acknowledgement number, control flags (SYN (connection request), ACK, RST (reset), FIN (end)), window (window size- number of pack- ets that will be accepted), checksum. + Provides means to connect with a socket [IP address, port number]. + Takes care of timeouts, retransmissions, flow control. + Some well known ports: 20, 21 (FTP), 23 (Telnet), 25 (SMTP) + UDP - User Datagram Protocol - unacknowledged transaction-oriented protocol parallel to TCP. + Levels 5-7: Session, Presentation and Application Layers: + SMTP- Simple Mail Transfer Protocol + DNS - Domain Name Service- maps names to ad- dresses + Top level is Network Information Center (NIC) computers. + FTP - File Transfer Protocol + Telnet - provides virtual terminal services. - .41 - + The mechanisms described above form the basis for tying to- gether distributed systems. So far, though, they've only been used for loose coupling: + Each machine is completely autonomous: separate account- ing, separate file system, separate password file, etc. + Can send mail between machines. + Can transfer files between machines (but only with spe- cial commands). + Can execute commands remotely. + Can login remotely. + Loose coupling like this is OK for a network with only a few machines spread all over the country, but not for a high- performance LAN where every user has a private machine. + What would we like in a distributed computer system? + Unified, transparent file system. + Unified, transparent computation - from any terminal, you can run on any machine, transparently. (You actually shouldn't care which machine you're running on.) + Load Balancing, process migration, file migration. + Local area networks can more or less provide that now. + Wide area networks cannot provide this transparenly, due to performance problems. May be possible in future? + Distributed File Systems + Remote files appear to be local (except for performance). - .42 - + Issues: + Failures - what happens when remote system crashes. + Performance - remote is not same as local. Can do some caching. + Sun's NFS (Network File System) + NFS permits the mounting of a remote file system as if it were local. Therefore, by using mount commands, can set up transparent distributed file system. + Caches file blocks, descriptors at both clients and servers. + Write-through caching. When file is closed, all modified blocks are sent immediately to server disk. "Close" doesn't return until all bytes stored on disk. + Consistency is weak. Polls periodically for changes to file; may use old version until pol- ling. May have simultaneous conflicting updates. + Server keeps no state about client (except hints, for performance). Each read gives enough info to do en- tire operation. (I.e. Read(I#, position). + When server crashes and restarts, can start again processing requests immediately. + All requests are "idempotent" - i.e. can be repeated with no ill effects. So if message may be lost, can resend (and possibly redo). - .43 -