Reinforcement Learning Lecture Slides
- Lecture 1. Introduction
- Lecture 1 1up.
- Lecture 1 4up.
- Reading Sutton and Barto chapter 1.
- Lecture 2. Bandit Problems
- Lecture 2 1up.
- Lecture 2 4up.
- See also Sutton and Barto Figures 2.1 and 2.4.
- Reading Sutton and Barto chapter 2.
- Work by Quentin
Stout et al. on bandit problems applicable to clinical
trials. Here is
the paper referred to in the lecture
- Lecture 3. Introduction to Reinforcement Learning
- Lecture 3 1up, 4up.
- Reading You should already have read Sutton and Barto
Chapters 1 and 2. Also read Sutton and Barto Sections 3.1, 3.2 and 3.3.
- Lecture 4. Reinforcement Learning Formalism and Bellman Equations
- Lecture 4 1up, 4up.
- Read Sutton and Barto Chapter 3.
- Lecture 5. Bellman Equations
- See lecture 4 for Bellman Equations slides.
- Handout from Gillian with golf example, etc.
- Lecture 6. Dynamic Programming (Intro)
- Handout from Gillian and see Lecture 7 slides.
- Read Sutton and Barto Chapter 4.
- Lecture 7. Dynamic Programming
- Lecture 7 1up, 4up.
- Note not all diagrams, etc. are in these slides. See Gillian for
the complete set.
- Read Sutton and Barto Chapter 4.
- Lecture 8. Monte Carlo Methods
- Lecture 8 1up, 4up.
- Note not all diagrams, etc. are in these slides. See Gillian for
the extra slides.
- Read Sutton and Barto Chapter 5.
- Lecture 9. Continuing Monte Carlo Methods. No specific
lecture note 9.
- Lecture 10. Temporal Difference Learning
- Lecture 10 1up, 4up.
- Note not all diagrams, etc. are in these slides. See Gillian for
the complete set.
- Read Sutton and Barto Chapter 6.
- Lecture 11. Eligibility Traces.
- Lecture 11 page.
- This is just a header. Notes from Gillian.
- Read Sutton and Barto Chapter 7.
- Lecture 12. No lecture 12.
- Lecture 13. Using Q-Learning: A Q-Learning Example
- Lecture 13 1up, 4up
- Note not all diagrams, etc. are in these slides. See Gillian for
the complete set.
- Look at Mahadevan
and Connell, Automatic programming of behavior-based robots
using reinforcement learning, Artificial Intelligence 55(2-33),
311-365, 1992 (you may need to save this, uncompress it, and view it
with gv) and at Hoar,
Wyatt and Hayes, Multiple evaluation techniques for robot
learning, DAI research paper RP850, 1997.
- Lecture 14. Eligibility Traces. Generalisation and Function Approximation
- See Lecture 11 for notes on eligibility traces.
- Lecture 14 1up.
- These are just a few of the slides. Other slides from Gillian.
- Read Sutton and Barto Chapter 8.
- Lecture 15. Generalisation and Function Approximation
- See Lecture 14 for notes on function approximation.
- Lecture 16. No lecture 16.
- Lecture 17. We finished off Generalisation and Function
Approximation.
- Lecture 18. Planning or Model-Based Learning. Focussed Web Crawling.
- Lecture 19. SOFMs: Self-Oranising Feature Maps (Kohonen)
- Lecture 1up. Note not all diagrams, etc. are in these slides. See Gillian for
the complete set.
- Lecture 20. Combining Kohonen Nets and Reinforcement Learning
- Lecture 1up. We actually did this in
Lecture 19 and there was no Lecture 20. Note not all diagrams, etc. are in these slides. See Gillian for
the complete set.
- Andrew Smith's PhD
Thesis page.