You will be implementing the Q-learning reinforcement learning method. You will be implementing it rather generally, but we also give you an applet that runs a "knight, treasure, monster" environment and uses your code to guide a knight. The knight wants to collect treasure, but he is pursued by a monster that wants to eat him. The monster has a simple greedy policy--if it can see the knight, it heads straight for him. If it lands on him, it eats him (giving him a negative reward) and the episode is over. If the knight steps onto the treasure, he gets a reward.
You can view the knight acting in the world on the "Play" tab of the applet. If you set it to "Smart Knight", the knight will use the policy learned by your Q-learning algorithm. If you set it to "Greedy Knight", he will instead use a greedy policy like the monster's.
You can train your Q-learner Knight agent via the Train tab of the applet.
javac *.java
appletviewer SwingApplet.java
. If using Eclipse, you will have to make a "Run as Applet" Run setting. You may need to resize the window (I do).RLearner.java
- implement Q-learning so as to learn a near-optimal policy.RLPolicy.java
- make a policy that can choose the best action in any state.RLWorld
works, as you will need to use it to produce your training runs.RLController
. You should not have to modify it, but sometimes it's nice to know how your code is being called so you can be confident of its function.answers.txt
containing your answers to the questions.