Assignment 3: Back-Propagation (Computational Credit)
Assigned Tuesday, February 7th   -   submit electronically via traditional CS instructional account system

Note: Any clarifications will be posted to bSpace - check there first if you have a question.

INTRODUCTION

Your assignment is to implement a simple neural network with and without one hidden layer. It will support feedforward activation and backpropagation learning. To ease you through this project, it has broken down into two separate assignments, each with its own due date. If you are unable to complete the first part, a working solution will be made available in order to help you complete the second part.

Note that, as promised, you need not complete this homework in Java. However, there are several reasons why you should:
  1. If doing it in Java, you get to use our starter code, which makes your life easier.
  2. If you use another language, you may have to demo it so we see that it works as promised.
  3. If you use another language, you are much less likely to get partial credit, since it will be much harder to read your code.


  4. Part 1: Due Thursday, February 15th, by 11:59pm - this part will not be accepted more than 2 days late
    • Implement backpropagation for a simple single layer (no hidden nodes) network, with two input nodes and a single output node.
    • The structure of the network is already set up, however the functions for training and backpropagation are incomplete.
    • Answer two quick questions about the consequences of adding a hidden layer.
    Part 2: Due Thursday, February 22nd, by 11:59pm
    • Using the updated starter code or your working code from Part 1, implement a multilayer network with one or more layers of hidden nodes and an arbitrary number of input and output nodes.
    • You will have to make important network architecture decisions and understand how a hidden layer changes the backpropagation algorithm.

    Each part will be graded by an autograder. You will also be able to test your program with a simple version of the autograder and track your progress through the assignment. It is important to note that a successful completion of Part 2 will also satisfy Part 1. You may want to understand Part 2 before you start programming Part 1 in order to plan ahead, but it is not necessary.

    IMPORTANT NOTE: Make sure you understand the backpropagation algorithm (as covered in lecture, section, the handout, and course readings) BEFORE you begin coding. A good, simple walkthrough of backpropagation can be found here.


    Part 1:
       Due Thusrday, February 15th, by 11:59pm - this part will not be accepted more than 2 days late introduction    -    part 1    -    part 2

    Implementing backpropagation:

    For this part you will complete a simple neural network with two input nodes and a single output node. For non-Java instructions, see below.

    The provided code already sets up a three-node net for you, however you are tasked with supporting feedforward activation and backpropagation learning. You should make the following assumptions:
    • Input and output patterns will be contained in an int[][] (an array of patterns, each of which is an array of integers).
    • The output unit uses the sigmoid activation function and should also have a bias.
    • The network has some training parameters that you may wish to vary during experimentation as appropriate. Be sure to include both learningRate and momentum in your weight updates.
    • Weights should be updated on a per-pattern basis (i.e., you needn't implement batch learning).
    1. Copy the starter code, particularly including Net.java, Unit.java, TesterPart1.java, and runTest.sh. Take some time to examine the Net, Unit and TesterPart1 classes they define respectively. Each contains a number of methods, most of which are marked as ** to be filled in **. You are required to implement the methods in Net.java and Unit.java. The third file, TesterPart1.java, is provided to help you test and debug the project, as well as producing the output for the autograder. Note: all code must work with Java 1.5 (i.e. don't use new stuff from version 6.)

    2. Study the provided code and understand how the network and data structures are being used.
      Net.java already creates and links 4 units in the appropriate fashion: 2 input nodes, an output node, and a bias node. (Click on the figure to enlarge.) When testing how your network learns, the autograder only makes 2 functional calls to your code. The first is Net n = new Net(data, targets); to create the network, followed by n.train(); to train the network. The functions setTrainingParameters(...), logNetwork(), logWeights(), logActivationCalculation(), and logWeightUpdates() may also be called, but these are already written for you. Notice that logWeights() and logWeightUpdates assume that outUnit.inWeights[0] is the weight from inUnit1 to outUnit, while outUnit.inWeights[2] is the weight from the bias to outUnit.

    3. Implement the backpropagation algorithm. There are 8 functions that need to be filled in. If you don't know where to start, try completing the functions in the same order that the test file checks them:
      in Unit.java:
      1. initialize(): Randomize all incoming weights to values chosen uniformly between -1 and 1 (the values of Net.MIN_WEIGHT and Net.MAX_WEIGHT).
      2. computeActivation(): Apply sigmoid function to weighted sum of inputs
      3. computeError(): Compute error for the output node
      4. computeWeightChange(): Calculate the current weight change
      5. updateWeights(): Update changes to weights for this pattern
      in Net.java:
      1. feedforward(): Present pattern and compute activations for rest of net
      2. computeError(): Present all patterns to network and calculate current error
      3. train(): Train the net according to the current training parameters, and output important information as you train:
        1. First, output the initial state of the network by calling logNetwork().
        2. Then, for the first 10 epochs (cycles through each training data point), for each data point you must call logActivationCalculation() and logWeightUpdates() to output the progress of training for your network. So, for 4 training data points, you should call these functions 40 times (4 for each of the first 10 epochs). That is: for the first 10 epochs you must print the network's state and change for every data point. Only 10 epochs are required because that is long enough to verify that your program performs the computations correctly.
        This output will be used to grade your assignment, so be careful to get it right!
      If you need to see the equations (from the slides in class and section), they are highlighted in the following set of slides: (pdf) (powerpoint)

    4. Test and Comment your Program. Make sure to comment your code clearly. If your code is wrong and your comments are specific enough (don't write a book) at least we will know your intent.
      There are two easy ways to test your program after you have compiled all of your code ( % javac -g *.java ).
      1. Call TesterPart1 with % java TesterPart1 to run a few very elementary tests on your various functions. Although they can't guarantee your functions are correct, they will at least make sure you are not making a silly error. If you fail one of the earlier tests, it will also cause you to fail later tests.
      2. Train your network with various parameters, making sure sure that changing the momentum, learning rate, etc... has the expected effect. Make the call:
        % java TesterPart1 training_file ne lr mom ec
        where:
        training_file is the training file containing the function to be learned (either "and.data", "or.data", "xor.data", or "same.data")   -   of course xor and same won't be learned yet
        ne is the number of epochs
        lr is the learning rate
        mom is the momentum
        ec is the error criterion
        for example (with arbitrary numbers for ne, lr, mom, ec):
        % java TesterPart1 and.data 500 .1 .5 .1


    What to submit for Part 1

    You should submit your assignment using the submit program available on the EECS instructional system; instructions for doing this can be found here. This assignment is a3-1. Be sure to include:
    • Your completed code, including Net.java and Unit.java. Remember to submit any other code files you have created. Do not submit the .class files.
    • The output (output.tgz) resulting from running the shell script runTest.sh. This will run your program on and.data, or.data, xor.data, and same.data. It will test the network on each of these with 1000 epochs, 3 different learning rates (0.02, 0.1, and 0.5), and 3 different momentum settings (0.0, 0.5, and 0.9), with an error criterion of 0.1.

      You call this script by running bash runTest.sh, and it will produce the output file output.tgz.

      If you are not using Java, you must run the tests yourself and produce a tarball of the resulting output. You should name our output files [train]-[lr]-[mom].out, where [train] is the training set, [lr] is the learning rate, and [mom] is the momentum. You should create a tarball by running tar -czf output.tgz *.out.
    • Answers to these questions, in a file called answers.txt. Make sure this includes the usual information.
      1. Under what conditions is a network with hidden layers computationally identical to one without any hidden layers?
      2. How would you change Part 1 of this assignment? was it too difficult? too easy?
      3. (If relevant) What kinds of problems did you encounter?
        • Describe any design problems you may have had.
        • If your network doesn't work, where is the problem?
        • If you weren't able to get everything working or ran out of time, explain your approach, what you know, and what you might do differently next time.
    Note: If you cheat, we will catch you. Don't cheat!

    Non-Java instructions

    If you are not using Java, then your program must load the input files and produce output of the right sort. The format is as follows.

    Input

    Input files will look like this:
    DATA_DESCRIPTION
    [number of data points] [number of inputs per datum] [number of outputs per datum]
    DATA
    [input data separated by spaces] ; [output data separated by spaces]
    [input data separated by spaces] ; [output data separated by spaces]
    ...

    Output

    Output files should look like this:
    NETWORK
    [max number of epochs] [learning rate] [momentum] [error cutoff]
    WEIGHTS
    [output neuron weights in order, with bias unit last, separated by spaces]
    [then, for each datum for the first 10 epochs, you should output the following:]
    WEIGHTS
    [output neuron weights in order, with bias unit last, separated by spaces]
    ACTIVATION
    [activation of input neurons, separated by spaces]
    [activation of output neuron]
    MOMENTUM
    [momentum terms for all weights of output neuron, separated by spaces]
    WEIGHT CHANGE
    [change in weights for all weights of output neuron, separated by spaces]
    [...]

    for example:
    NETWORK
    10000 0.1 0.1 0.1
    WEIGHTS
    0.5 0.5 0.5 
    WEIGHTS
    0.5 0.5 0.5
    ACTIVATION
    1.0 1.0
    0.8175744761936437
    MOMENTUM
    0.0 0.0 0.0
    WEIGHT CHANGE
    0.0027208119642790096 0.0027208119642790096 0.0027208119642790096
    WEIGHTS
    [...] 

    If you are not using Java, you must run the tests yourself (see above) and produce a tarball of the resulting output. You should name our output files [train]-[lr]-[mom].out, where [train] is the training set, [lr] is the learning rate, and [mom] is the momentum. You should create a tarball by running tar -czf output.tgz *.out.

    Note: If you do not use Java, you may be asked to demo your program for grading. You also may not receive partial credit if your program does not produce output as it should. I recommend that you use Java!



    Part 2:
       Due Thursday, February 22nd, by 11:59pm introduction    -    part 1    -    part 2

    Expanding the Network:

    In this part you will be expanding your neural network to include an arbitrary number of input, hidden, and output units. In the first part, four units were hard coded into the Net class: two input units, one output unit, and one bias unit. Now that you understand the back-propagation algorithm, it is time to add hidden layers. You must overhall the internal structure and make important architectural decisions.
    1. Copy the files in this directory, especially including TesterPart2.java and runTest2.sh and the set of data files. If you don't like your solution to part1, also copy the updated files Net.java and Unit.java. The files will be updated the morning of Sunday, February 17th. However if you want the files earlier, send an e-mail to l e o n |a| b a r r e t t n e x u s |.| c o m.

    2. Decide what data structures you will need to supplement the ones supplied. Issues you will need to consider:
      • How are connections between units represented? Will you just use the knowledge that one layer is completely connected to the next layer, or will you explicitly add input and output connections as in Part 1?
      • Are input/hidden/output units represented differently? How do you distinguish between them?
      • Will you keep the bias Unit from part 1? or do you plan on giving each unit its own bias property?
      • How will you make sure that the propagation of activation (feedforward) and propagation of error (backpropagation) are done in the right order?

    3. Start by updating the Net.java constructor
      To accommodate the additional flexibility the Net constructor will need to be passed 5 arguments instead of the original 2. The header for the new constructor follows: (note: you may want to overload the constructor instead of replacing it) You may assume that neither hLayers nor nHid will be 0.
      /**
      * Constructor for Net class.
      *
      * Create Net with the specified architecture and data.
      *
      * @param in Number of input units
      * @param hLayers Number of hidden layers
      * @param nHid Number of Hidden units per layer
      * @param out Number of output units
      * @param patterns Array of input patterns (each an array of int)
      * @param targets Array of output patterns (each an array of int)
      */

      public Net (int in, int hLayers, int nHid, int out, int[][] patterns, int[][] targets) {
         // ** to be filled in ** //
      }
      You may also need to update functions associated with network construction. It is at this step that the bulk of you architectural decisions must be made, including how to store the arbitrary number of input, hidden, and output nodes. It may be easier to start with only a single hidden layer. (If you end up only being able to implement one hidden layer you won't lose many points.)

    4. Update feedforward activation.
      Once you've implemented feedforward, you will have a good idea if you've made the right decisions and have correctly set up your network.

    5. Update your backpropagation.
      • Although most of the changes are topical (from explicit unit calls to iteration over an arbitrary number of units) the addition of a hidden layer may cause a few problems.
      • Hidden units calculate error differently from output units.
      • When there are hidden units, you must be careful about the order of units in which error is calculated.

    6. Update the train() function
      This is most likely where you will run into problems with order.

    7. Update the logging functions Because logging is tricky, I give examples of code for the following functions. Suppose we start with a Net that has the following member data:
      Unit[] inputLayer, hiddenLayer, outputLayer;
      • logNetwork() --
        You should uncomment the line that outputs the new network parameters.
      • logWeights() --
        You should output the weights of every unit, not just one output neuron. Use logUnitArrayWeights() on each layer, starting from the first hidden layer and going all the way down to the output layer. For example:
        System.out.println("WEIGHTS");
        logUnitArrayWeights(hiddenLayer);
        logUnitArrayWeights(outputLayer);
      • logActivationCalculation() --
        You should output the activations of every layer of neurons, starting from the input layer, and going all the way down to the output layer. You should use logUnitArrayActivationCalculation(). For example:
        System.out.println("ACTIVATION");
        logUnitArrayActivationCalculation(inputLayer);
        logUnitArrayActivationCalculation(hiddenLayer);
        logUnitArrayActivationCalculation(outputLayer);
      • logWeightUpdate() --
        You should output the weight momenta and changes of every layer of neurons, starting from the first hidden layer, and going all the way down to the output layer. You should use logUnitArrayWeightUpdates(). For example:
        System.out.println("MOMENTUM");
        logUnitArrayMomentum(hiddenLayer);
        logUnitArrayMomentum(outputLayer);
        System.out.println("WEIGHT CHANGE");
        logUnitArrayWeightUpdates(hiddenLayer);
        logUnitArrayWeightUpdates(outputLayer);
      • Remember, the grader depends on this output being in the correct format! There are helpful comments on these in the code.

    8. Once again, test and comment your program. Make sure to comment your code clearly. If your code is wrong and your comments are specific enough (don't write a book) at least we will know your intent.
      There are two easy ways to test your program after you have compiled all of your code ( % javac -g *.java ).
      1. you can call TesterPart2 with % java TesterPart2 to run a few very elementary tests on your various functions. These tests are much less sophisticated than for Part 1, since we know much less about exactly what data structures you are using. Instead of testing particular functions, we simply test to make sure that the network learns appropriately.
      2. To train your network with various parameters make the call:
        % java TesterPart2 ne lr mom ec training_file hidden_layers nhid
        where:
        training_file is the file containing the function to be learned (either "and.data", "or.data, "xor.data", "same.data", "auto8.data", or "multi.data")
        ne is the number of epochs
        lr is the learning rate
        mom is the momentum
        ec is the error criterion
        hidden_layers is the number of hidden layers
        nhid is the number of units in each hidden layer
        for example (with arbitrary numbers for ne, lr, mom, ec):
        % java TesterPart1 same.data 500 .1 .5 .1 1 2
      3. You can run a more complete test by running runTest2.sh, which will run your network on a set of different training examples. It will produce a bunch of output files, and also produce a tarball output2.tgz. You can then check that your output is correct by running ~cs182/sp07/assignments/a3/runCheck2.sh. This will both check that your output is correct and that your calculations are correct.

    9. Your program should now be able to correctly learn all 6 of these functions.

    What to submit for Part 2

    You should submit your assignment using the submit program available on the EECS instructional system; instructions for doing this can be found here. Be sure to include:
    • Your completed code, including Net.java and Unit.java. Remember to submit any other code files you have created. Do not submit the .class files.
    • The file output.tgz that is created when you run runTest2.sh.
    • Answers to these questions, in a file called answers.txt. Make sure this includes the usual information.
      1. How would you change Part 2 of this assignment? was it too difficult? too easy?
      2. (If relevant) What kinds of problems did you encounter?
        • Describe any design problems you may have had.
        • If your network doesn't work, where is the problem?
        • If you weren't able to get everything working or ran out of time, explain your approach, what you know, and what you might do differently next time.
    Note: If you cheat, we will catch you. Don't cheat!

    Non-Java instructions

    Input

    Input files format remains the same.

    Output

    Output files should look like this:
    NETWORK
    [number of inputs] [number of hidden layers] [number of units per hidden layer] [number of outputs]
    [max number of epochs] [learning rate] [momentum] [error cutoff]
    WEIGHTS
    [neuron weights in order, with bias unit last, separated by spaces within each
    neuron, by semicolons between neurons, and by lines between layers.  This
    should start with the first hidden layer and continue down to the output layer]
    [then, for each datum for the first 10 epochs, you should output the following:]
    WEIGHTS
    [same as above]
    ACTIVATION
    [activation of all neurons, separated by spaces between neurons and lines between layers]
    MOMENTUM
    [momentum terms for all weights of non-input neurons, separated the same as the weights]
    WEIGHT CHANGE
    [change in weights for all weights of non-input neurons, separated the same as the weights]
    [...]

    for example:
    NETWORK
    2 2 2 1
    1000 0.02 0.0 0.1
    WEIGHTS
    0.22051690944146296 -0.8130003265513113 0.41217327487350186 ; 0.7289876739953414 0.10119771776036735 -0.13575908727810804 
    -0.1962154682981445 -0.09246885261677784 -0.5642219914245667 ; -0.6007576557089827 0.8182160455920622 -0.6408346562611702 
    -0.2764660909365326 0.3156927913649834 -0.6572684507723037 
    WEIGHTS
    0.22051690944146296 -0.8130003265513113 0.41217327487350186 ; 0.7289876739953414 0.10119771776036735 -0.13575908727810804 
    -0.1962154682981445 -0.09246885261677784 -0.5642219914245667 ; -0.6007576557089827 0.8182160455920622 -0.6408346562611702 
    -0.2764660909365326 0.3156927913649834 -0.6572684507723037 
    ACTIVATION
    1.0 1.0 
    0.45504419780056976 0.6669508558018775 
    0.32845506693166254 0.40890447884945574 
    0.3500118905082658 
    MOMENTUM
    0.0 0.0 0.0 ; 0.0 0.0 0.0 
    0.0 0.0 0.0 ; 0.0 0.0 0.0 
    0.0 0.0 0.0 
    WEIGHT CHANGE
    -2.4843429755797513E-5 -2.4843429755797513E-5 -2.4843429755797513E-5; 4.471891984316359E-5 4.471891984316359E-5 4.471891984316359E-5
    -8.206729396915873E-5 -1.2028469368609132E-4 -1.803501601950455E-4; 1.0268840314478733E-4 1.5050871693206405E-4 2.256668772860436E-4
    9.714033209945766E-4 0.0012093318347454834 0.0029574922684833605
    WEIGHTS
    [...] 

    If you are not using Java, you must run the tests yourself and produce a tarball of the resulting output. The tests can be found in runTest2.sh; you may want to simply modify this script to run your program for you. You should create a tarball by running tar -czf output.tgz *.out.

    Note: If you do not use Java, you may be asked to demo your program for grading. You also may not receive partial credit if your program does not produce output as it should. I recommend that you use Java!

    - last edited : 2007-02-09