Resources

Lecture schedule

Day, Date Lecture Topic Additional references
Thurs, Aug 23 Overview + review
Tues, Aug 28 ML review
Thurs, Aug 30 ML review
Tues, Sept 4 Multiplicative weights 1 iPython notebook
Thurs, Sept 6 Multiplicative weights 2 We roughly followed Section 2.1 of this paper to upper bound the regret of multiplicative weights.
Our lower bound proof followed Chapter 3 of Cesa-Bianchi and Lugosi.
Tues, Sept 11 Regularization/perturbation Notes from a MIT course on information theory and Chapter 3, Cover & Thomas cover the connections between entropy and volume that we discussed in lecture.
Thurs, Sept 13 Follow-the-Perturbed-Leader, intro to online optimization FTPL paper (introduces the algorithm and proof we covered), Cornell course notes containing FTPL proof outline
Tues, Sept 18 Online linear optimization Lectures 2, 3, and 4 from UW course
Thurs, Sept 20 Online convex optimization, intro to games Lecture from CS270 Spring 2016 for high-level overview
Tues, Sept 25 Learning and zero-sum games Proof of minimax theorem through exponential weights in UMich course notes. Original proof presented from Section 7.2 of CBL.
Thurs, Sept 27 Adaptation: The doubling trick
Tues, Oct 2 Adaptation to stochasticity: AdaHedge Paper on doubling trick adaptive Hedge, Paper on elegant AdaHedge
Thurs, Oct 4 Adapting the model order Peter Bartlett’s talk slides on model selection, (more technical) Pascal Massart’s lecture notes, Section 8 on model selection
Tues, Oct 9 Adapting the model order 2
Thurs, Oct 11 Intro to limited information feedback
Tues, Oct 16 Stochastic multi-armed bandits (MAB): UCB Section 2.2 in Bubeck, Cesa-Bianchi monograph
Thurs, Oct 18 Stochastic MAB: UCB and lower bound Section 2.3 in Bubeck, Cesa-Bianchi monograph
Tues, Oct 23 Stochastic MAB: Lower bound continued Section 2.3 in Bubeck, Cesa-Bianchi monograph, (more technical) original paper by Lai and Robbins.
Thurs, Oct 25 Stochastic Bayesian MAB: Thompson sampling Tutorial on Thompson sampling
Tues, Oct 30 Information theoretic analysis of Thompson sampling Paper on information-theoretic analysis
Thurs, Nov 1 Information theoretic analysis, continued
Tues, Nov 6 Adversarial bandits, EXP3 Section 3 in Bubeck, Cesa-Bianchi monograph
Thurs, Nov 8 Contextual bandits, EXP4 Section 4 in Bubeck, Cesa-Bianchi monograph
Tues, Nov 13 Reinforcement learning 1
Thurs, Nov 15 Reinforcement learning 2
Tues, Nov 20 Reinforcement learning 3
Tues, Nov 27 Advanced reinforcement learning 1
Thurs, Nov 29 Advanced reinforcement learning 2

Homework schedule

HW number Release date Due date HW links
1 Wed, Sept 5 Wed, Sept 12
2 Fri, Sept 14 Fri, Sept 28
3 Fri, Sept 28 Fri, Oct 12
4 Fri, Oct 12 Fri, Oct 26
5 Fri, Oct 26 Fri, Nov 9
6 Fri, Nov 9 Mon, Nov 26
7 (maybe optional) Mon, Nov 26 Fri, Dec 7

Midterm: October 20, exact duration TBD.

Project presentations: RRR week.

Additional references

1. “Prediction, Learning and Games” by Nicolo Cesa-Bianchi and Gabor Lugosi: online prediction and game theory.

2. “Regret Analysis of Stochastic & Non-Stochastic Multi-Armed Bandit Problems”, by Sebastien Bubeck & Nicolo Cesa-Bianchi: multi-armed bandits.

3. Lecture notes from Csaba Szepesvari and Tor Lattimore on multi-armed bandits and contextual bandits.

4. “Dynamic Programming and Optimal Control”, by Dimitri Bertsekas: control theory.

5. “Reinforcement Learning: An Introduction”, Richard Sutton & Andrew Barto: reinforcement learning.

Disclaimer: the references listed above are comprehensive overviews of their topics (for their time), and cover topics additional to the ones we will cover in the course. We are also covering a few newer topics that are not dealt with in these books. They are great to read through, but cannot be considered as official guidelines for what we will cover in lectures/HWs.