Assignment 3

Assigned Tuesday, February 7th
Due Wednesday, February 22th, by 11:59pm - submit electronically.
Note: please submit this assignment as a3-t.

In this assignment, we will experiment with a standard backpropagation example, the auto-encoder.

See our Computing Resources page for information about downloading the Tlearn software for use at home, or the Instructional Computing Tlearn help page for how to use tlearn in the Soda clusters.
Refer to R7 in your reader (Plunkett and Elman: Ch. 1 and Appendix B) for general instructions on how to run Tlearn.

The Auto-Encoder

The basic idea of this assignment is very simple: get tlearn to produce output that is as close as possible to its input. As is conventional, we will restrict ourselves to binary strings. The catch is that the network will have a hidden layer that is significantly smaller than the input and output layers.

The basic case will have 4 binary input units, 2 hidden units and 4 output units. This is called the 4-2-4 encoder. Only one input unit will be "on" (set to 1) at a time. We want the network to learn weights that will cause the corresponding output unit to turn on.

We can envision the network as doing a kind of compact encoding. For example, if some language had only 4 phonemes, the auditory system could get by with just 2 fibers for transmitting one phoneme at a time. More realistically, a phoneme from a language with 64 phonemes could be transmitted by just 6 nerve fibers from one brain region to another. One could imagine a complex neural structure that computed which phoneme was most likely at each moment and another complex structure that made use of phonemes to make up words. Since each phoneme has different uses, we would need a separate unit for each one at the receiving end, but the transmission could be done more compactly using the idea above.

The assignment is to experiment with how well backpropagation learning can do at finding weights that will produce a good encoding.

Problem 1
The first part of this assignment is to analyze how the system does on the 4-2-4 encoder problem. Start tlearn as in Assignment 2 and create a project file for the 4-2-4 encoder problem. Then experiment with different values for the learning rate and momentum. (We suggest using random initialization and sampling.)

Explain how the parameter values affected the final error rate and number of trials needed.
Notice that separate runs might have wildly different outcomes, since the initial weights are random. How does this relate to learning in animals?
What in the network is mapping the input to the output? Another way of asking this question is, how is the network encoding the input?

Problem 2
Now expand the task to solve the 8-3-8 encoder problem.

Experiment with and report on the effects of different parameter values.
Do you notice any differences in moving to a larger task?

Problem 3
Finally, try tlearn on the 9-3-9 encoder problem.

Experiment with and report on the effects of different parameter values.
Why can it learn this despite the fact that 3 bits is not enough to encode 9 values?
What pattern of activation accomplishes the mapping from (1,0,0,0,0,0,0,0,0) to (1,0,0,0,0,0,0,0,0) in your network?
Is this the only pattern that could give you the right result, in this network?
Would your answer to the previous question be different if the hidden layer was much larger?

Problem 4
Tlearn uses the backpropagation algorithm that we studied in class. The goal of this problem is to produce in a hand simulation of one step in the learning of the 4-2-4 encoder.

Pick a time about half way through the training and copy (approximate) values for the weights learned up to that time.
Compute the output values for the next teaching input and compare these with the appropriate training values.
Illustrate the calculations that Tlearn uses to update the weights for the next iteration. Don't forget the momentum factor.

The assigned reading has further discussion as well as numerical values that may be of use.

Problem 5

You have just calculated a lot of numbers. What, if anything, do these numbers mean in neural terms?
What aspects of real neural systems are mapped by PDP systems?
What aspects of PDP systems do NOT correspond to neural systems?

Problem 6

Imagine a PDP system with the following structure. You have 7 input units (i1, i2, i3, i4, i5, i6, i7), an arbitrarily large hidden layer, and two output units (o1, o2). Two input units will be "on" (set to 1) at any given time.
If the two active input nodes are next to each other, o1 should fire. Otherwise, o2 should fire. Thus, if i3 and i4 are active, then o1 should be active. If i3 and i5 are active, o2 should be active.
Can the system learn this?
Assume the system can, and has, learned this. Now imagine moving the input nodes around. The connections and weights are all still the same, but now the spatial ordering of the input units is (i4, i2, i7, i1, i3, i6, i5)
Does the response of the output units to the input change at all?
Does the response of the outputs tell you anything about the spatial ordering of the input nodes?
How does this compare to human neural systems?