Resources

Lecture schedule

Day, Date	Lecture Topic	Additional references
Thurs, Aug 23	Overview + review
Tues, Aug 28	ML review
Thurs, Aug 30	ML review
Tues, Sept 4	Multiplicative weights 1	iPython notebook
Thurs, Sept 6	Multiplicative weights 2	We roughly followed Section 2.1 of this paper to upper bound the regret of multiplicative weights. Our lower bound proof followed Chapter 3 of Cesa-Bianchi and Lugosi.
Tues, Sept 11	Regularization/perturbation	Notes from a MIT course on information theory and Chapter 3, Cover & Thomas cover the connections between entropy and volume that we discussed in lecture.
Thurs, Sept 13	Follow-the-Perturbed-Leader, intro to online optimization	FTPL paper (introduces the algorithm and proof we covered), Cornell course notes containing FTPL proof outline
Tues, Sept 18	Online linear optimization	Lectures 2, 3, and 4 from UW course
Thurs, Sept 20	Online convex optimization, intro to games	Lecture from CS270 Spring 2016 for high-level overview
Tues, Sept 25	Learning and zero-sum games	Proof of minimax theorem through exponential weights in UMich course notes. Original proof presented from Section 7.2 of CBL.
Thurs, Sept 27	Adaptation: The doubling trick
Tues, Oct 2	Adaptation to stochasticity: AdaHedge	Paper on doubling trick adaptive Hedge, Paper on elegant AdaHedge
Thurs, Oct 4	Adapting the model order	Peter Bartlett’s talk slides on model selection, (more technical) Pascal Massart’s lecture notes, Section 8 on model selection
Tues, Oct 9	Adapting the model order 2
Thurs, Oct 11	Intro to limited information feedback
Tues, Oct 16	Stochastic multi-armed bandits (MAB): UCB	Section 2.2 in Bubeck, Cesa-Bianchi monograph
Thurs, Oct 18	Stochastic MAB: UCB and lower bound	Section 2.3 in Bubeck, Cesa-Bianchi monograph
Tues, Oct 23	Stochastic MAB: Lower bound continued	Section 2.3 in Bubeck, Cesa-Bianchi monograph, (more technical) original paper by Lai and Robbins.
Thurs, Oct 25	Stochastic Bayesian MAB: Thompson sampling	Tutorial on Thompson sampling
Tues, Oct 30	Information theoretic analysis of Thompson sampling	Paper on information-theoretic analysis
Thurs, Nov 1	Information theoretic analysis, continued
Tues, Nov 6	Adversarial bandits, EXP3	Section 3 in Bubeck, Cesa-Bianchi monograph
Thurs, Nov 8	Contextual bandits, EXP4	Section 4 in Bubeck, Cesa-Bianchi monograph
Tues, Nov 13	Reinforcement learning 1
Thurs, Nov 15	Reinforcement learning 2
Tues, Nov 20	Reinforcement learning 3
Tues, Nov 27	Advanced reinforcement learning 1
Thurs, Nov 29	Advanced reinforcement learning 2

Homework schedule

HW number	Release date	Due date	HW links
1	Wed, Sept 5	Wed, Sept 12
2	Fri, Sept 14	Fri, Sept 28
3	Fri, Sept 28	Fri, Oct 12
4	Fri, Oct 12	Fri, Oct 26
5	Fri, Oct 26	Fri, Nov 9
6	Fri, Nov 9	Mon, Nov 26
7 (maybe optional)	Mon, Nov 26	Fri, Dec 7

Midterm: October 20, exact duration TBD.

Project presentations: RRR week.

Additional references

1. “Prediction, Learning and Games” by Nicolo Cesa-Bianchi and Gabor Lugosi: online prediction and game theory.

2. “Regret Analysis of Stochastic & Non-Stochastic Multi-Armed Bandit Problems”, by Sebastien Bubeck & Nicolo Cesa-Bianchi: multi-armed bandits.

3. Lecture notes from Csaba Szepesvari and Tor Lattimore on multi-armed bandits and contextual bandits.

4. “Dynamic Programming and Optimal Control”, by Dimitri Bertsekas: control theory.

5. “Reinforcement Learning: An Introduction”, Richard Sutton & Andrew Barto: reinforcement learning.

Disclaimer: the references listed above are comprehensive overviews of their topics (for their time), and cover topics additional to the ones we will cover in the course. We are also covering a few newer topics that are not dealt with in these books. They are great to read through, but cannot be considered as official guidelines for what we will cover in lectures/HWs.