Assignment 7: Reinforcement Learning (Computational)
Assigned Tuesday, April 8th
Due Thursday, April 17th, by 11:59pm - submit electronically.


Questions

(To be answered based on the lectures and readings--you should not have to search the internet for the answers.)
  1. Why do we think that exponential discounting not the way we humans discount future rewards? Describe, as part of this, how procrastination indicates that exponential discounting is incorrect.
  2. In what ways do we animals trade off exploring versus exploitation of our environment?
  3. How is prediction error encoded in the brain?

Programming

You will be implementing the Q-learning reinforcement learning method. You will be implementing it rather generally, but we also give you an applet that runs a "knight, treasure, monster" environment and uses your code to guide a knight. The knight wants to collect treasure, but he is pursued by a monster that wants to eat him. The monster has a simple greedy policy--if it can see the knight, it heads straight for him. If it lands on him, it eats him (giving him a negative reward) and the episode is over. If the knight steps onto the treasure, he gets a reward.

You can view the knight acting in the world on the "Play" tab of the applet. If you set it to "Smart Knight", the knight will use the policy learned by your Q-learning algorithm. If you set it to "Greedy Knight", he will instead use a greedy policy like the monster's.

You can train your Q-learner Knight agent via the Train tab of the applet.

Notes