Reinforcement Learning 2016/2017

Typically, lecture slides will be added/updated one day before the lecture. Lectures will be held between 12:10 - 13:00 in Teviot Lecture Theatre, Medical School, Doorway 5 on Tuesdays and same time same place on Fridays.
Basic Mathematical Background: Please review this cribsheet to make sure you understand the concepts therein. You may also find these resources useful as occasional reference material.
On Using Matlab: Take a look at this handout Introduction to MATLAB giving an introduction to MATLAB (you may ignore the section about NETLAB). A further MATLAB tutorial is available at MTU Introduction to Matlab.
Note that the coursework will also require other tools and programming environments, which will be introduced and explained in lectures.

Lecture content:
Assignments and Deadlines:
January 17, 2017
Slides (pdf)
Reading: Ch 1 of Sutton & Barto book (1st ed.)
January 20, 2017
Multi-armed Bandits; Review of Markov Chains; Introduction to Markov Decision Processes
Slides (pdf)
Reading: Ch 2, 3 of Sutton & Barto book (1st ed.)
January 24, 2017
Intro to MDPs Contd.
Reading: Ch 2, 3 of Sutton & Barto book (1st ed.)
January 27, 2017
Dynamic Programming: Policy and Value Iteraction; Monte Carlo methods
Slides (pdf)
Reading Ch 4, 5 of Sutton & Barto book (1st ed.)
January 31, 2017
Temporal Difference Methods
Slides (pdf)
Reading: Ch 6 of Sutton & Barto book (1st ed.)
February 3, 2017
Discussion of On-policy/Off-policy Learning; TD Methods Contd.
Slides (pdf)
Reading: Ch 5, 6 Sutton & Barto book (1st ed.)
February 7, 2017
[Tutorial] Worked examples
Outline questions (pdf)
Course Assignment 1
February 10, 2017
[Tutorial] Worked examples, continued
February 14, 2017
[Tutorial] Introduction to the Arcade Learning Environment
Reference: ALE Website
Slides (pdf)
February 17, 2017
[Tutorial] Q+A regarding tools for HW1
February 28, 2017
Generalization and Function Approximation
Slides (pdf)
Reading: Ch 8 of Sutton & Barto book (1st ed.)
March 2, 2017

Assignment 1 Due (4 pm, submit electronically and hand in hardcopy to ITO)
March 3, 2017
Abstraction: Options and Hierarchy
Slides (pdf)
Reading: Case study, Sec 11.4 (Elevator Dispatching) in print version of S+B book
Optional Readings:
1. R.S. Sutton, D. Precup, S. Singh, Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning, Artificial Intelligence, Vol. 112, pp. 181 - 211, 1999. ( ElsevierLink)
2. A.G. Barto, S. Mahadevan, Recent Advances in Hierarchical Reinforcement Learning, Discrete Event Dynamic Systems 13(4):341-379, 2003. You can get the article via SpringerLink or get the preprint version here.
Course Assignment 2
March 7, 2017
Partial Observability and the Partially Observed Markov Decision Process (POMDP)
Slides (pdf) (based on material associated with Thrun et al. book)
Optional Reading: Chapter 15 of S. Thrun, W. Burgard, D. Fox, Probabilistic Robotics, MIT Press.
March 10, 2017
POMDPs Contd.
March 14, 2017
Inverse Reinforcement Learning
Slides (pdf)
Optional Reading: A.Y. Ng, S.J. Russell, Algorithms for inverse reinforcement learning. In Proc. ICML, pp. 663-670, 2000.
Preprint here.
March 17, 2017
[Tutorial] Discussion and tools Assignment 2
Slides (pdf)
March 21, 2017
Exploration and Controlled Sensing
Slides (pdf)
March 24, 2017
[Office Hour with TA]
March 28, 2017
Multi-agent Reinforcement Learning
Slides (pdf)
Optional Reading: M. Bowling, M. Veloso, An analysis of stochastic game theory for multiagent reinforcement learning, CMU Technical Report CMU-CS-00-165, 2000.
Assignment 2 Due (4 pm, submit electronically and hand in hardcopy to ITO)
March 31, 2017
Policy Optimization [Not examinable]
Slides (pdf)
April 4, 2017
Deep Reinforcement Learning [Not examinable]
Slides (pdf)
Optional Reading: V. Mnih et al., Human level control through deep reinforcement learning, Nature 518:529-533, 2015.
Optional Reading: A. Tamar et al., Value iteration networks, In Proc. NIPS 2016. Slides
April 7, 2017
[Tutorial] Q+A and Review for Exam
Slides (pdf)

RL Home

Home : Teaching : Courses : Rl 

Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail:
Please contact our webadmin with any comments or corrections. Logging and Cookies
Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh