Reinforcement Learning 2016/2017

Typically, lecture slides will be added/updated one day before the lecture. Lectures will be held between 12:10 - 13:00 in Teviot Lecture Theatre, Medical School, Doorway 5 on Tuesdays and same time same place on Fridays.

Basic Mathematical Background: Please review this cribsheet to make sure you understand the concepts therein. You may also find these resources useful as occasional reference material.

On Using Matlab: Take a look at this handout Introduction to MATLAB giving an introduction to MATLAB (you may ignore the section about NETLAB). A further MATLAB tutorial is available at MTU Introduction to Matlab.
Note that the coursework will also require other tools and programming environments, which will be introduced and explained in lectures.

Date:	Lecture content:	Assignments and Deadlines:
January 17, 2017	Introduction Slides (pdf) Reading: Ch 1 of Sutton & Barto book (1st ed.)
January 20, 2017	Multi-armed Bandits; Review of Markov Chains; Introduction to Markov Decision Processes Slides (pdf) Reading: Ch 2, 3 of Sutton & Barto book (1st ed.)
January 24, 2017	Intro to MDPs Contd. Reading: Ch 2, 3 of Sutton & Barto book (1st ed.)
January 27, 2017	Dynamic Programming: Policy and Value Iteraction; Monte Carlo methods Slides (pdf) Reading Ch 4, 5 of Sutton & Barto book (1st ed.)
January 31, 2017	Temporal Difference Methods Slides (pdf) Reading: Ch 6 of Sutton & Barto book (1st ed.)
February 3, 2017	Discussion of On-policy/Off-policy Learning; TD Methods Contd. Slides (pdf) Reading: Ch 5, 6 Sutton & Barto book (1st ed.)
February 7, 2017	[Tutorial] Worked examples Outline questions (pdf)	Course Assignment 1
February 10, 2017	[Tutorial] Worked examples, continued
February 14, 2017	[Tutorial] Introduction to the Arcade Learning Environment Reference: ALE Website Slides (pdf)
February 17, 2017	[Tutorial] Q+A regarding tools for HW1
February 28, 2017	Generalization and Function Approximation Slides (pdf) Reading: Ch 8 of Sutton & Barto book (1st ed.)
March 2, 2017		Assignment 1 Due (4 pm, submit electronically and hand in hardcopy to ITO)
March 3, 2017	Abstraction: Options and Hierarchy Slides (pdf) Reading: Case study, Sec 11.4 (Elevator Dispatching) in print version of S+B book Optional Readings: 1. R.S. Sutton, D. Precup, S. Singh, Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning, Artificial Intelligence, Vol. 112, pp. 181 - 211, 1999. ( ElsevierLink) 2. A.G. Barto, S. Mahadevan, Recent Advances in Hierarchical Reinforcement Learning, Discrete Event Dynamic Systems 13(4):341-379, 2003. You can get the article via SpringerLink or get the preprint version here.	Course Assignment 2
March 7, 2017	Partial Observability and the Partially Observed Markov Decision Process (POMDP) Slides (pdf) (based on material associated with Thrun et al. book) Optional Reading: Chapter 15 of S. Thrun, W. Burgard, D. Fox, Probabilistic Robotics, MIT Press.
March 10, 2017	POMDPs Contd.
March 14, 2017	Inverse Reinforcement Learning Slides (pdf) Optional Reading: A.Y. Ng, S.J. Russell, Algorithms for inverse reinforcement learning. In Proc. ICML, pp. 663-670, 2000. Preprint here.
March 17, 2017	[Tutorial] Discussion and tools Assignment 2 Slides (pdf)
March 21, 2017	Exploration and Controlled Sensing Slides (pdf)
March 24, 2017	[Office Hour with TA]
March 28, 2017	Multi-agent Reinforcement Learning Slides (pdf) Optional Reading: M. Bowling, M. Veloso, An analysis of stochastic game theory for multiagent reinforcement learning, CMU Technical Report CMU-CS-00-165, 2000.	Assignment 2 Due (4 pm, submit electronically and hand in hardcopy to ITO)
March 31, 2017	Policy Optimization [Not examinable] Slides (pdf)
April 4, 2017	Deep Reinforcement Learning [Not examinable] Slides (pdf) Optional Reading: V. Mnih et al., Human level control through deep reinforcement learning, Nature 518:529-533, 2015. Optional Reading: A. Tamar et al., Value iteration networks, In Proc. NIPS 2016. Slides
April 7, 2017	[Tutorial] Q+A and Review for Exam Slides (pdf)

RL Home

Home : Teaching : Courses : Rl