Phase 4: Networks and Distributed Systems

LAST MODIFIED: 30 Apr 2002 by aswan

The final assignment is to experiment with networking. We will provide you with some low-level network communications facilities; you will build a nicer abstraction on top of those, and then use that abstraction in building a distributed chat program.

Each node in the network will be implemented as a separate instance of Nachos. Therefore you will have one JVM running for each network node; these JVMs will be running on the same actual machine, even though we are pretending as though they are distributed on a network.

Each Nachos "node" has a single connection to the network, which is implemented using UDP sockets. The class nachos.machine.NetworkLink provides this functionality for you. Each node has a unique link address which represents the physical network address of that node. You can access the network link using the method Machine.networkLink(). Calling getLinkxAddress() on the NetworkLink object will return the link address of this node; the type of the address is just an int. Since multiple Nachos nodes will be running on the same actual machine, you will need to use different swap files for each node to avoid interference. An easy way to do this is to append the network link address to the swap filename.

You can test out the basic network functionality by building in the proj4 directory and simultaneously running nachos in two different windows on the same machine. As Nachos initializes, the first line of text contains in it the address of its network link. For example, if you see:

nachos 5.0j initializing... config interrupt timer processor console network(1) user-check grader
this indicates that the network link address is 1 (the value in parenthesis after network in the above output).

The new machine files for this assignment include:

The new kernel files for this assignment, found in the network directory, include:

PostOffice provides a more convenient communications abstraction than the raw network link. It provides user-to-user communication on top of the network link's node-to-node communication, but does not provide reliable message delivery. PostOffice is useful for understanding how to use the NetworkLink and you may find it useful as a starting point for the design of your transport protocol.

Note that NetKernel extends VMKernel and NetProcess extends VMProcess, meaning that this project depends upon your phase 3 solution working correctly. If you wish, you may modify the classes to extend UserKernel and UserProcess instead, so that your code only depends upon project phase 2. In order for this to work, ensure that the total number of physical memory pages is large enough to hold the entire code for your test programs. This is controlled by the line

Processor.numPhysPages = 16
in the nachos.conf file.

  1. (75%) Implement the transport protocol documented in this specification. Connections are created via the two syscalls connect() and accept() (documented in syscall.h), which provide reliable, connection-oriented, byte-stream communication between connection endpoints. A connection endpoint is a link address combined with a port number. A user program connects to a remote node by calling connect() with a remote network link ID (also called a host ID) and a port number. The remote node must call the accept() system call to accept the connection; the system call takes a local port number on which connections can be made.

    When the connection is established, each system call returns a new file descriptor representing the connection (in UNIX terminology, this file descriptor is called a "socket"). The application can then send messages to the network by calling write() on the socket, and can read messages from the network by calling read() on the socket. The connection is closed by one end of the connection performing a close() operation on the socket file descriptor. After a connection is closed, any read() or write() calls on the socket on the closed socket should return -1, indicating an error. any further write() calls on the remote endpoint should return -1 but the remote end must be able to continue to receive any buffered data (via the read() system call).

    There is one important detail to consider here. If a program does something like:

         write(socket, buf, 100);
         close(socket);
    
    then although the connection has been closed, the 100 bytes of data have probably not been acknowledged yet by the remote host. Your protocol must ensure that all data written to a socket is delivered to the remote host, even if the socket is closed before all outgoing data is acknowledged. This also means that any read calls on the socket by the remote host must succeed until all of pending data has been read, even though the socket has been closed. This means that you cannot simply "drop" the state for a socket when it is closed; you have to keep the socket "alive" until all of the data is delivered!

    Your implementation of read() and write() on sockets must provide full-duplex, reliable, byte-stream communication with no message size limits. This means that packets can flow in both directions on the connection simultaneously. Also, if one node calls write() with a buffer of size N, then the remote host will receive exactly the same data by calling read() (although multiple calls to read() might be required). The read() call may not return all of the data at once. For example, if node A calls

    write(socket, buffer, 100);  /* Sender */
    
    and node B calls
    read(socket, buffer, 10); /* Receiver */ 
    
    then B will have to call read() ten times before receiving all of the data. Also, read() may return less data than the user requested. For example, if B calls
    read(socket, buffer, 1000); /* Receiver */ 
    
    then if only 100 bytes are available, read() will return only those 100 bytes. In general the amount of data returned by each call to read need not be equal to the corresponding write() call on the remote host. Also, read() does not block waiting for data to be received on the network; it returns immediately (with a return value of 0).

    To test your protocol in the presence of network loss, the Nachos network link can be made to randomly drop packets by modifying the value of the NetworkLink.reliability key in nachos.conf. This is a floating-point number, between 0 and 1, representing the likelihood that a packet will be successfully delivered.

  2. (25%) Demonstrate your message passing primitives work by using them to implement an IRC-like "network chat" application among multiple (at least 3) users. Network chat is like the UNIX talk utility, except it can be used for N-way conversations (not just 2-way); each user sits at a different machine, and anything anyone types is broadcast to everyone else connected to the chat room.

    You should write two programs: a chat server (called chatserver.c) and a chat client (called chat.c). The server runs on one node and accepts connections from chat clients. The chat client connects to the chat server and accepts user input from the console. After the user types a line of text, the line is transmitted to the chat server. The server then broadcasts the message to all of the clients connected to it. Each message received by the chat client should then be displayed on the console. Users can dynamically connect and disconnect from the chat room.