Resources
Day, Date | Lecture Topic | Additional references |
Thurs, Aug 23 | Overview + review | |
Tues, Aug 28 | ML review | |
Thurs, Aug 30 | ML review | |
Tues, Sept 4 | Multiplicative weights 1 | iPython notebook |
Thurs, Sept 6 | Multiplicative weights 2 | We roughly followed Section 2.1 of this paper to upper bound the regret of multiplicative weights. Our lower bound proof followed Chapter 3 of Cesa-Bianchi and Lugosi. |
Tues, Sept 11 | Regularization/perturbation | Notes from a MIT course on information theory and Chapter 3, Cover & Thomas cover the connections between entropy and volume that we discussed in lecture. |
Thurs, Sept 13 | Follow-the-Perturbed-Leader, intro to online optimization | FTPL paper (introduces the algorithm and proof we covered), Cornell course notes containing FTPL proof outline |
Tues, Sept 18 | Online linear optimization | Lectures 2, 3, and 4 from UW course |
Thurs, Sept 20 | Online convex optimization, intro to games | Lecture from CS270 Spring 2016 for high-level overview |
Tues, Sept 25 | Learning and zero-sum games | Proof of minimax theorem through exponential weights in UMich course notes. Original proof presented from Section 7.2 of CBL. |
Thurs, Sept 27 | Adaptation: The doubling trick | |
Tues, Oct 2 | Adaptation to stochasticity: AdaHedge | Paper on doubling trick adaptive Hedge, Paper on elegant AdaHedge |
Thurs, Oct 4 | Adapting the model order | Peter Bartlett’s talk slides on model selection, (more technical) Pascal Massart’s lecture notes, Section 8 on model selection |
Tues, Oct 9 | Adapting the model order 2 | |
Thurs, Oct 11 | Intro to limited information feedback | |
Tues, Oct 16 | Stochastic multi-armed bandits (MAB): UCB | Section 2.2 in Bubeck, Cesa-Bianchi monograph |
Thurs, Oct 18 | Stochastic MAB: UCB and lower bound | Section 2.3 in Bubeck, Cesa-Bianchi monograph |
Tues, Oct 23 | Stochastic MAB: Lower bound continued | Section 2.3 in Bubeck, Cesa-Bianchi monograph, (more technical) original paper by Lai and Robbins. |
Thurs, Oct 25 | Stochastic Bayesian MAB: Thompson sampling | Tutorial on Thompson sampling |
Tues, Oct 30 | Information theoretic analysis of Thompson sampling | Paper on information-theoretic analysis |
Thurs, Nov 1 | Information theoretic analysis, continued | |
Tues, Nov 6 | Adversarial bandits, EXP3 | Section 3 in Bubeck, Cesa-Bianchi monograph |
Thurs, Nov 8 | Contextual bandits, EXP4 | Section 4 in Bubeck, Cesa-Bianchi monograph |
Tues, Nov 13 | Reinforcement learning 1 | |
Thurs, Nov 15 | Reinforcement learning 2 | |
Tues, Nov 20 | Reinforcement learning 3 | |
Tues, Nov 27 | Advanced reinforcement learning 1 | |
Thurs, Nov 29 | Advanced reinforcement learning 2 | |
|
HW number | Release date | Due date | HW links |
1 | Wed, Sept 5 | Wed, Sept 12 | |
2 | Fri, Sept 14 | Fri, Sept 28 | |
3 | Fri, Sept 28 | Fri, Oct 12 | |
4 | Fri, Oct 12 | Fri, Oct 26 | |
5 | Fri, Oct 26 | Fri, Nov 9 | |
6 | Fri, Nov 9 | Mon, Nov 26 | |
7 (maybe optional) | Mon, Nov 26 | Fri, Dec 7 | |
|
Midterm: October 20, exact duration TBD.
Project presentations: RRR week.
1. “Prediction, Learning and Games” by Nicolo Cesa-Bianchi and Gabor Lugosi: online prediction and game theory.
2. “Regret Analysis of Stochastic & Non-Stochastic Multi-Armed Bandit Problems”, by Sebastien Bubeck & Nicolo Cesa-Bianchi: multi-armed bandits.
3. Lecture notes from Csaba Szepesvari and Tor Lattimore on multi-armed bandits and contextual bandits.
4. “Dynamic Programming and Optimal Control”, by Dimitri Bertsekas: control theory.
5. “Reinforcement Learning: An Introduction”, Richard Sutton & Andrew Barto: reinforcement learning.
Disclaimer: the references listed above are comprehensive overviews of their topics (for their time), and cover topics additional to the ones we will cover in the course. We are also covering a few newer topics that are not dealt with in these books. They are great to read through, but cannot be considered as official guidelines for what we will cover in lectures/HWs.
|