PMR - Probabilistic Modelling and Reasoning: Lectures

Please sign up to nota bene to access the lecture notes and comment pages. You should have received a sign up link by email. If you have not, please contact me.

Tutorials

The tutorial sheets will be made available week by week. It is vital that you work through these and attend the tutorial. You are recommended to first look through the tutorial sheet yourself, then get together with other members of the tutorial to discuss the questions, and work out what your issues are. In some tutorials, answers to the tutorials will be provided ahead of time to help you check what you have done. However you should try to do things independently of the answers at first. If you came up with different ways to answer or different answers to the questions then that is also good: it is worth discussing in the tutorial what makes one method or one answer better.

The tutorials are driven and owned by you, not the tutor: the tutors job is to help you to improve your understanding, to avoid pitfalls, to know what is required of you. Your tutor will start the tutorial by collating the issues to be addressed (adding some of his/her own).

Week By Week Information.

Lectures are a conversation between the lecturer and members of the class. Lectures are more than a set of slides, and the slides are unlikely to reflect what you need to understand. Please work through the multiple resources available. This may include other more comprehensive slides, notes and directions. Lectures will be used to motivate, solidify understanding and provide examples. You should be happy to ask questions in a lecture, even though it is not a small lecture hall. The other resources will be used to learn the details you need over the rest of the week. It will be important to work though the course text (Barber) and do the exercises listed. The solutions for the exercises in David Barber's book are available on DICE in the directory:

/afs/inf.ed.ac.uk/group/teaching/mlprdata/Barber

Furthermore, at MSc Level, there is an expectation that you will use your own initiative to find resources that help you learn what you need for this course.

Scope

The topic and content explored by the lectures and tutorials generally defines the examinable scope, but the exams will expect a level of application of these methods, and generalisation of the methods, e.g. by putting together two things you have learnt. The books contain many other sections that are not covered in lectures. The summary below gives a summary of the examinable topics.

Summary of Examinable Content

General understanding and context of probabilistic models. Modelling real problems. Capturing both structure and uncertainty in real problems. The impossibility of inference with prior assumption
Splitting a model into factors, and representing probability distributions in a factorial representation. The chain rule as a factor representation.
Factor graphs as a representation of the factorial structure of a probability distribution. Modelling systems using factor graphs. Defining probability distributions by combining factor graphs with probability values via specifying the values in factors.
Directed and undirected factor graphs. Conditional independence. Direct dependence. Conditional independence relationships in factor graphs. Separation rules for determining conditional independence in factor graphs. Relationship between conditional independence in factor graphs and in corresponding probability distributions: there can be no conditional independence in the factor graph that is not always true for the distribution. There can be conditional independence relationships in the distribution not encoded by the factor graph. The idea of a minimal factor graph. You should know the separation rules for factor (and other) graphs and have practiced using them.
Inference (conditioning) in factor graphs. The elimination algorithms - the sum product and max product algorithms. The elimination algorithm in trees. Message passing rules for doing elimination for computing all marginals in trees. You should know the elimination algorithm and have practiced using it. You should know the message passing rules and have practiced using them.
Markov Networks and Belief (Bayesian) Networks. Converting to factor graphs. Relationships between all types of graphs. Markov Equivalence. Separation rules for Markov Networks and Bayesian Networks. Forming Belief Networks using the Chain Rule of probability, conditional independence and conditional probability tables. Understanding the parameters in conditional probability tables. Computing the parametric complexity from a Bayesian Network.
Bayesian Learning and Inference on Extrinsic Variables (parameters). The Plate Notation. Bayes theorem and the use of Bayes theorem. Exponential family models. Examples of exponential family distributions. The Gaussian distribution. The multivariate Gaussian. The closure of linear Gaussian models under conditioning and marginalisation. The form of multivariate Gaussian distributions. Conjugacy. Conjugate exponential models. Examples of conjugate priors. Bayesian Learning in Conjugate Exponential models as parameter updates. Online (Incremental) Bayesian Learning - updating prior to posterior one point at a time. You should practice doing Bayesian Learning and Inference for simple Conjugate exponential models.
Maximum Likelihood and Maximum Posterior computation, and understanding it as an approximation to Bayesian Learning. Independent Component Analysis and its relationship to/difference from PCA. You should be able to compute maximum likelihood/posterior values/update rules for simple distributions via Lagrangian optimization.
The Free Energy, and its use as a bound to the marginal log liklihood. The relationship between minimum KL divergence between and approximation and the posterior to the free energy. Minimizing the Free Energy as approximate inference. The difficulty of computing the entropy term in the Free Energy. The variational approximation. You should have practiced working with the variational free energy for simple factorising approximations and small distributions. The Bethe Free Energy, and knowledge that loopy belief propagation (if it converges) finds a fixed point of the Bethe Free Energy. You should have practiced belief propagation (as above) on some simple loopy examples.
The Idea of Sampling. Sampling from the posterior as approximate inference. The Monte-Carlo approximation. The principle behind Markov Chain Monte Carlo. Ergodicity. Reversibility (Detailed Balance). All different forms of sampling lectured on. You should have practiced doing Gibbs sampling on some simple distributions.
The Boltzmann Machine, The Restricted Boltzmann Machine, Block Sampling in the Restricted Boltzmann Machine. The derivative of a general exponential family distribution w.r.t. the parameters, and hence the derivative of the Boltzmann Machine. Gradient optimization of the Boltzmann Machine using Gibbs sampling. Gradient optimization of the restricted Boltzmann Machine using block Gibbs sampling. The contrastive divergence rule. The general procedure for pretraining a neural network using the restricted Boltzmann Machine. Stochastic gradient methods. The use of minibatches and online learning.
Decision Theory. Utility and maximum utility. Different Forms of Utility Functions.
Markov Models. Hidden Markov Models. Filtering, Smoothing and prediction in hidden Markov models. The forward backward algorithm as message passing. The viterbi alignment as the max product algorithm. You should practice the use of these methods on some simple forms of hidden Markov models.
Linear state space models. Filtering, smoothing and prediction in linear state space models. You should have tried out this on some simple model forms.
General understanding of probabilistic methods: you should be able to generalise beyod the specific examples on the course to similar forms of models, and hence show that you have an understanding which enables you to generate your own appropriate probabilistic models and methods.

Week 1

The first lecture this week will introduce PMR. In particular we will look at the difference between what we study on PMR and MLPR, and emphasise the topics in PMR are somewhat broader, encompassing many topics in MLPR but going to more general settings.

To prepare for lecture one, I suggest you review your probability theory etc., and have a look again at the MLPR notes.

Lecture 1

Slides for lecture one.

After lecture 1, you should work through the notes provided, which also provides an exam like question on the lecture. You should work through all of Chapter 1 of Barber. You should do Exercise 1.1, 1.2 using the rules of probability (Def 1.1-1.5). Then you should work through 1.3 to 1.10. You should convince yourself you can answer Ex. 1.14 to 1.18. Finally do Exercise 1.19.

Remember to use one another. Work through things together. Ask one another questions to make sure you fully understand what is going on. Try to convince others with your answers: if you can't convince someone else, you probably do not have a thorough enough explanation. This process of using one another is very important. Get in the habit of doing it.

Look at the Introduction 4-Up from earlier years, especially the summary of distributions at the end. This made a very dull lecture, so I have stopped going through the distributions as they are better to work through in your own time. They are still important to know.

Finally, please review Chapter 2 in preparation for Lecture 2.

Additional resources that may be useful: Make your own dragon illusion (close one eye). Self-check maths sheet Check if you can do the questions on tut0.pdf

Lecture 2

Slides for lecture 2.

Lecture 2 will motivate and introduce Factor Graphs. After lecture 2, read Barber section 4.4 and make sure you understand it thoroughly: it is a short section. Barber starts with belief networks and Markov networks and then mentions factor graphs. We will do things the other way around: this Section 4.4 is the foundation for our understanding of belief networks and Markov networks that we will be covering next. After reading 4.4. Do Exercises 4.1, 4.2, 4.3. Here, where it says "Markov Network" just read "distribution" for now. For the distributions in Ex. 4.10, 4.11 and 4.12 just draw the factor graphs corresponding to those distribution. Finally, convince yourself what happens in a factor graph when we condition on a variable node having a value, and when we marginalise (sum out) over a particular variable node.

There are additional notes and example exam-like questions available in the notes on the lecture.

Finally start looking at Chapter 3 up to section 3.4.

Tutorials No tutorials in week 1 (nor week 2).

Week 2

The first tutorial sheet is here: Tutorial Sheet 1 includes work on factor graphs and eliminatation. The answers to tutorial sheet one are now available.

Lecture 3

Slides for lecture 3.

Lecture 3 was about conditional independence in factor graphs. Again review section 4.4. The paper by Brendan Frey will be useful in the long run: have a look now, though for the moment you will not understand the references to Belief Networks or Markov Networks: don't worry - it is just to help you get a gist.

The most important thing here is to start working through the tutorial sheet.

Lecture 4

Slides for lecture 4.

Lecture 4 focused on elimination and message passing. In Barber, you should work through section 5.1.2 through to 5.1.5, and 5.2.1 through to 5.2.4. Also 5.4 is important. Perhaps now is the time to start looking at code that implements these sorts of systems to help you ground what you are doing. See 5.6 regarding David Barber's code. It is useful to get a feel for that in action. Some of the questions on p100-101 work with implementing that. See demoSumprod.m for example. You shoud be able to answer 5.1. 5.2 is a good thought experiment that will force you to think through the issues. In both of these read "factor graph" or "probability distribution" instead of Markov network and it will be fine. Some people asked how big a network we can handle. Well so long as we can do message passing, computation is linear in the number of nodes, so we can go very big. There are web scale applications of inference. Question 5.6 gives a very simple webscale example which is effectively equivalent to the original PageRank algorithm (whoops I've given you the answer!).

In looking at other resources to help with your understanding you may have been frustrated by references to belief/Bayesian networks or Markov Networks which we haven't handled yet. Most courses teach these first and factor graphs later. We have done things the other way round because mixed factor graphs capture all that is needed from both belief networks and Markov networks. So by focusing on factor graphs we only have to remember one rule for everything, rather than rules for each network type (I easily forget things and so I prefer approaches that reduce the amount I have to learn, and maximise the amount I can work out). But don't worry - I'll tell you all about belief networks and Markov networks next, so you know what they are. But in working with them, its best just to convert them to factor graphs.

Week 3

There are no tutorials next week.

Lecture 5

Slides for lecture 5

The focus this week will be inference, and also looking at different forms of graphical models. It would be good to work through the chapters in Barber on Belief Networks and the chapter on Graphical Models.

Friday Lecture Slot

There is no lecture today.

Week 4

The second tutorial sheet for week 5 looks at belief networks and Markov Networks.

The answers to the second tutorial sheet are now available..

Lecture 6

We will continue looking at different types of networks in this lecture. You should look at the chapter 3 in Barber and attempt the exercises 3.1 to 3.4, and 3.13 (Look up Markov equivalence).You should also work through Section 4.2 in Barber and do questions 4.3, 4.4 and 4.6.

Lecture 7

Slides for lecture 7. At this stage we now move on to learning. First we will focus on learning in probababilistic models, by seeing that learning is actually the same as inference, but just at a different level. Reading Sections 9.1 and 9.2 in Barber will help to clarify that. Question 3 on the tutorial sheet pulls this all together and so it will be worth following up on that.

Lecture 8

The focus of this lecture was on a question/answer session where we went through previous lectures, and examined potential misunderstanding or issues. We discussed the issues in the D-separation and factor-graph separation, and why conditioning on variables can make things that were previously conditionally independent then become dependent.

Week 5

There are no tutorials next week (Innovative Learning Week). The third tutorial sheet for week 6 looks at Bayesian Learning in graphical models. The answers to tutorial sheet three are now available.

Lecture 9

Slides for lecture 9.

We looked at the exponential family in this lecture, and introducing the idea of conjugate distributions. Exponential family distributions form the backbone for many exact Bayesian inference methods through the use of conjugacy. See the Wikipedia page on the conjugate prior as a good summary of conjugacy.

Lecture 10

Slides for lecture 10.

This lecture focuses on the use of approximations to Bayesian Learning and Inference, focusing on the relation maximum likelihood and maximimum posterior methods to approximate inference with a delta function posterior distribution. We use Independent Component Analysis as an example. See Chapter 34 of Mackay for Independent component analysis.

Week 6

The fourth tutorial sheet looks at Gaussian distributions. The answers to tutorial sheet four are now available.

Lecture 11

We will continue looking at approximate learning and inference.

Lecture 12

Learning and Inference 2 slides Further look at approximate learning methods, such as variational methods and loopy belief propagation. Look at the variational inference section of Barber for this, and chapters 29 to 33 of Mackay. An example exam question and answer for variational methods (and belief propogation) is given at Variational.pdf.

Week 7

The fifth tutorial sheet looks at Boltzmann Machines. You will need The Mnist Dataset and the code demobmsamp.m. The answers to tutorial sheet five are now available.

Lecture 13

Sampling and the Boltzmann Machine. We look at Gibbs sampling, and introduce the Boltzmann Machine and restricted Boltzman machine. The Boltzman machine is not covered in detail in Barber. Chapter 43 of Mackay (available online) covers Boltzmann machines, and you should work through that. For restricted Boltzmann machines, the Deep Learning Tutorial is one resource. You should also look at the Practical Guide to Training RBMs.

Lecture 14

Continuing look at Boltzmann Machines. Then a look at Decision Theory. Chapter 36 of Mackay and the exercises therein provide a good introduction to the Decision Theoretic elements.

Week 8

The sixth tutorial sheet looks at decision theory and hidden Markov models. Answers.

Lecture 15

We now look at Hidden Markov Models.

Lecture 16

Week 9

The seventh tutorial sheet looks at linear dynamical systems. Answers.

Lecture 17

Assignment Feeback

Lecture 18 More hidden Markov models, and linear dynamical systems.

Week 10

Lecture 19

Last lecture: More on linear dynamical systems.

Week by week listing from 2013-2014

Notation

Week 1

Lectures Introduction, Introduction 4-Up, Belief Networks, Belief Networks 4-Up
Make your own dragon illusion (close one eye). Self-check maths sheet Check if you can do the questions on tut0.pdf
Tutorials No tutorials in week 1 (nor week 2). See here for the Tutorials allocations

Week 2

Lectures Graphical Models, Graphical Models 4-Up, Inference, Inference 4-Up
Tutorials No tutorials in week 2. Tutorial sheet 1 for week 3. Answers for tutorial sheet 1

Week 3

Lectures Finishing off from last lecture. Decisions, Decisions 4-Up, Tutorials Tutorial sheet 2 for week 4. This tutorial uses JavaBayes which is available by entering the command "javabayes" on a dice machine, or via a java applet from the JavaBayes, from which documentation is also available. Answer sheet 2.

Week 8

We will discuss the assignment and finish the Discrete Latent State Space Models in the first lecture this week. The first draft of the assignment answers is available. The second lecture will be on latent linear state space models
The tutorials this week will focus on questions you have about the assignment. Tutors: please familiarise yourself with the assignment questions and answers above. Most Q are likely to focus on Q3, and tutors may want to focus their preparation on that Q. Though the answers are provided many students will still have uncertainties on how they could work out how to start tackling these sort of questions. The key is to go through the procedure of thinking through answering these sort of questions.

Week 9

We will finish the Latent Linear State Space Models. We will briefly look at ICA and Approximate Inference.
There will be no lectures next week, but there will be tutorials. See Tutorial Sheet 7.
Answer Sheet 7.

Week-by-Week listing from last year.

Week 1

Lectures Introduction slides slides4up, Belief networks slides slides4up
Self-check maths sheet Check if you can do the questions on tut0.pdf

Week 2

Lectures Belief networks continued, Elimination algorithm slides, slides4up, Gaussian distribution slides, slides4up
Handout Worked example for Holmes/Watson network
JavaBayes example sprinkler.bif (Bayes net for the Holmes-Watson-Rain-Sprinkler problem)

Week 3

Lectures Gaussian distribution ctd, Maximum likelihood estimation slides, slides4up, Bayesian parameter estimation slides slides4up
Handout on Inference with Gaussian Random Variables
Matlab code cointoss.m, matlab code to illustrate posterior distribution under Beta prior
Web resource The technical report by David Heckerman entitled "A tutorial on Learning Bayesian Networks". (original link)

Week 4

NO LECTURE ON FRI 12 OCT
Lectures Decision theory slides, slides4up, Mixture models slides, slides4up
Handout Working for Gaussian classifier

Week 5

Lectures Mixture models continued, Factor analysis and beyond slides, slides4up
Handout Working for EM for Mixture of Gaussians
Handout Working for PCA solution as principal eigenvector
Handout on Factor Analysis and Beyond

Week 6

Lectures Factor analysis and beyond ctd, Bayesian Model Selection slides, slides4up, Hidden Markov models slides slides4up
Web resource Te-Won Lee's Blind source separation demo
Web resource Short explanation of blind source separation from Helsinki
Optional reading Paper on GTM by C. M. Bishop, M. Svensen and C. K. I. Williams (original link)
Web resource chapter 28 from David MacKay's book Information Theory, Inference and Learning Algorithms. See sections 28.1, 28.2 on model comparison

Week 7

Lectures Hidden Markov Models continued, Time Series Modelling and Kalman Filters slides, slides4up
Handout Working for alpha and beta recursions for HMMs
Web resource L Rabiner tutorial on HMMs from Proceedings of the IEEE 77(2) 1989 is available from the IEL electonic library here.
Optional reading A Gentle Tutorial on the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models by Jeff A. Bilmes pdf (original link)
Web resource Harmonising chorales in the style of Johann Sebastian Bach, work by Moray Allan using a HMM (MSc, School of Informatics, Edinburgh, 2002). See also HMM Bach demo.
Handout on Time Series Modelling
Web resource Movie clips from Prof Andrew Blake's group illustrating tracking with non-linear Kalman filters

Week 8

Lectures Time Series Modelling and Kalman Filters ctd, Junction tree algorithm slides slides4up
Handout Worked example of inference in a junction tree
Handout Worked example for c->b->a network

Week 9

NO LECTURE ON FRIDAY 16 NOV
Lectures Undirected graphical models slides, slides4up
Handouts Working for local Markov property, Working for Boltzmann machine conditional distribution, Working for derivative of a log-linear model
Web resource Information about GrabCut from MSR Cambridge

Week 10

Lectures Last lecture on Tues 20 Nov.
Finish off Undirected graphical models, followed by question and answer session. If there is time I will then discuss Coding and Information Theory (not examinable) slides, slides4up

Probabilistic Modelling and Reasoning

Course Lectures

Tutorials

Week By Week Information.

Scope

Summary of Examinable Content

Week 1

Week 2

Week 3

Week 4

Week 5

Week 6

Week 7

Week 8

Week 9

Week 10

Week by week listing from 2013-2014

Week 1

Week 2

Week 3

Week 4

Week 5

Week 6

Week 7

Week 8

Week 9

Week-by-Week listing from last year.

Week 1

Week 2

Week 3

Week 4

Week 5

Week 6

Week 7

Week 8

Week 9

Week 10