- Slides for Introduction.
- Slides for Factor Graphs.
- Slides for Independence in Factor Graphs.
- Slides for Inference in Factor Graphs.
- Slides for Belief Networks and Markov Networks
- Slides for Learning as Inference
- Slides for The Exponential Family
- Slides for Approximate Learning and Inference
- Slides for Approximate Learning and Inference 2
- Slides for Sampling
- Slides for Sampling 2
- Slides for The Boltzmann Machine
- Slides for Decision Theory
- Slides for Hidden Markov Models
- Slides for Linear Dynamical Systems

Please sign up to nota bene to access the lecture notes and comment pages. You should have received a sign up link by email. If you have not, please contact me.

The tutorial sheets will be made available week by week. It is vital that you work through these and attend the tutorial. You are recommended to first look through the tutorial sheet yourself, then get together with other members of the tutorial to discuss the questions, and work out what your issues are. In some tutorials, answers to the tutorials will be provided ahead of time to help you check what you have done. However you should try to do things independently of the answers at first. If you came up with different ways to answer or different answers to the questions then that is also good: it is worth discussing in the tutorial what makes one method or one answer better.

The tutorials are driven and owned by you, not the tutor: the tutors job is to help you to improve your understanding, to avoid pitfalls, to know what is required of you. Your tutor will start the tutorial by collating the issues to be addressed (adding some of his/her own).

Lectures are a conversation between the lecturer and members of the class. Lectures are more than a set of slides, and the slides are unlikely to reflect what you need to understand. Please work through the multiple resources available. This may include other more comprehensive slides, notes and directions. Lectures will be used to motivate, solidify understanding and provide examples. You should be happy to ask questions in a lecture, even though it is not a small lecture hall. The other resources will be used to learn the details you need over the rest of the week. It will be important to work though the course text (Barber) and do the exercises listed. The solutions for the exercises in David Barber's book are available on DICE in the directory:

/afs/inf.ed.ac.uk/group/teaching/mlprdata/Barber

Furthermore, at MSc Level, there is an expectation that you will use your own initiative to find resources that help you learn what you need for this course.

The topic and content explored by the lectures and tutorials generally defines the examinable scope, but the exams will expect a level of *application* of these methods, and *generalisation* of the methods, e.g. by putting together two things you have learnt. The books contain many other sections that are not covered in lectures. The summary below gives a summary of the examinable topics.

- General understanding and context of probabilistic models. Modelling real problems. Capturing both structure and uncertainty in real problems. The impossibility of inference with prior assumption
- Splitting a model into factors, and representing probability distributions in a factorial representation. The chain rule as a factor representation.
- Factor graphs as a representation of the factorial
*structure*of a probability distribution. Modelling systems using factor graphs. Defining probability distributions by combining factor graphs with probability values via specifying the values in factors. - Directed and undirected factor graphs. Conditional independence. Direct dependence. Conditional independence relationships in factor graphs. Separation rules for determining conditional independence in factor graphs. Relationship between conditional independence in factor graphs and in corresponding probability distributions: there can be no conditional independence in the factor graph that is not always true for the distribution. There can be conditional independence relationships in the distribution not encoded by the factor graph. The idea of a minimal factor graph. You should know the separation rules for factor (and other) graphs and have practiced using them.
- Inference (conditioning) in factor graphs. The elimination algorithms - the sum product and max product algorithms. The elimination algorithm in trees. Message passing rules for doing elimination for computing all marginals in trees. You should know the elimination algorithm and have practiced using it. You should know the message passing rules and have practiced using them.
- Markov Networks and Belief (Bayesian) Networks. Converting to factor graphs. Relationships between all types of graphs. Markov Equivalence. Separation rules for Markov Networks and Bayesian Networks. Forming Belief Networks using the Chain Rule of probability, conditional independence and conditional probability tables. Understanding the parameters in conditional probability tables. Computing the parametric complexity from a Bayesian Network.
- Bayesian Learning and Inference on Extrinsic Variables (parameters). The Plate Notation. Bayes theorem and the use of Bayes theorem. Exponential family models. Examples of exponential family distributions. The Gaussian distribution. The multivariate Gaussian. The closure of linear Gaussian models under conditioning and marginalisation. The form of multivariate Gaussian distributions. Conjugacy. Conjugate exponential models. Examples of conjugate priors. Bayesian Learning in Conjugate Exponential models as parameter updates. Online (Incremental) Bayesian Learning - updating prior to posterior one point at a time. You should practice doing Bayesian Learning and Inference for simple Conjugate exponential models.
- Maximum Likelihood and Maximum Posterior computation, and understanding it as an approximation to Bayesian Learning. Independent Component Analysis and its relationship to/difference from PCA. You should be able to compute maximum likelihood/posterior values/update rules for simple distributions via Lagrangian optimization.
- The Free Energy, and its use as a bound to the marginal log liklihood. The relationship between minimum KL divergence between and approximation and the posterior to the free energy. Minimizing the Free Energy as approximate inference. The difficulty of computing the entropy term in the Free Energy. The variational approximation. You should have practiced working with the variational free energy for simple factorising approximations and small distributions. The Bethe Free Energy, and knowledge that loopy belief propagation (if it converges) finds a fixed point of the Bethe Free Energy. You should have practiced belief propagation (as above) on some simple loopy examples.
- The Idea of Sampling. Sampling from the posterior as approximate inference. The Monte-Carlo approximation. The principle behind Markov Chain Monte Carlo. Ergodicity. Reversibility (Detailed Balance). All different forms of sampling lectured on. You should have practiced doing Gibbs sampling on some simple distributions.
- The Boltzmann Machine, The Restricted Boltzmann Machine, Block Sampling in the Restricted Boltzmann Machine. The derivative of a general exponential family distribution w.r.t. the parameters, and hence the derivative of the Boltzmann Machine. Gradient optimization of the Boltzmann Machine using Gibbs sampling. Gradient optimization of the restricted Boltzmann Machine using block Gibbs sampling. The contrastive divergence rule. The general procedure for pretraining a neural network using the restricted Boltzmann Machine. Stochastic gradient methods. The use of minibatches and online learning.
- Decision Theory. Utility and maximum utility. Different Forms of Utility Functions.
- Markov Models. Hidden Markov Models. Filtering, Smoothing and prediction in hidden Markov models. The forward backward algorithm as message passing. The viterbi alignment as the max product algorithm. You should practice the use of these methods on some simple forms of hidden Markov models.
- Linear state space models. Filtering, smoothing and prediction in linear state space models. You should have tried out this on some simple model forms.
- General understanding of probabilistic methods: you should be able to generalise beyod the specific examples on the course to similar forms of models, and hence show that you have an understanding which enables you to generate your own appropriate probabilistic models and methods.

The first lecture this week will introduce PMR. In particular we will look at the difference between what we study on PMR and MLPR, and emphasise the topics in PMR are somewhat broader, encompassing many topics in MLPR but going to more general settings.

To prepare for lecture one, I suggest you review your probability theory etc., and have a look again at the MLPR notes.

After lecture 1, you should work through the notes provided, which also provides an exam like question on the lecture. You should work through all of Chapter 1 of Barber. You should do Exercise 1.1, 1.2 using the rules of probability (Def 1.1-1.5). Then you should work through 1.3 to 1.10. You should convince yourself you can answer Ex. 1.14 to 1.18. Finally do Exercise 1.19.

Remember to use one another. Work through things together. Ask one another questions to make sure you fully understand what is going on. Try to convince others with your answers: if you can't convince someone else, you probably do not have a thorough enough explanation. This process of using one another is very important. Get in the habit of doing it.

Look at the Introduction 4-Up from earlier years, especially the summary of distributions at the end. This made a very dull lecture, so I have stopped going through the distributions as they are better to work through in your own time. They are still important to know.

Finally, please review Chapter 2 in preparation for Lecture 2.

Additional resources that may be useful: Make your own dragon illusion (close one eye).
**Self-check maths sheet** Check if you can do the
questions on
tut0.pdf

Lecture 2 will motivate and introduce Factor Graphs. After lecture 2, read Barber section 4.4 and make sure you understand it thoroughly: it is a short section. Barber starts with belief networks and Markov networks and then mentions factor graphs. We will do things the other way around: this Section 4.4 is the foundation for our understanding of belief networks and Markov networks that we will be covering next. After reading 4.4. Do Exercises 4.1, 4.2, 4.3. Here, where it says "Markov Network" just read "distribution" for now. For the distributions in Ex. 4.10, 4.11 and 4.12 just draw the factor graphs corresponding to those distribution. Finally, convince yourself what happens in a factor graph when we condition on a variable node having a value, and when we marginalise (sum out) over a particular variable node.

There are additional notes and example exam-like questions available in the notes on the lecture.

Finally start looking at Chapter 3 up to section 3.4.

**Tutorials** No tutorials in week 1 (nor week 2).

The first tutorial sheet is here: Tutorial Sheet 1 includes work on factor graphs and eliminatation. The answers to tutorial sheet one are now available.

Lecture 3 was about conditional independence in factor graphs. Again review section 4.4. The paper by Brendan Frey will be useful in the long run: have a look now, though for the moment you will not understand the references to Belief Networks or Markov Networks: don't worry - it is just to help you get a gist.

The most important thing here is to start working through the tutorial sheet.

**Lecture 4**

Lecture 4 focused on elimination and message passing. In Barber, you should work through section 5.1.2 through to 5.1.5, and 5.2.1 through to 5.2.4. Also 5.4 is important. Perhaps now is the time to start looking at code that implements these sorts of systems to help you ground what you are doing. See 5.6 regarding David Barber's code. It is useful to get a feel for that in action. Some of the questions on p100-101 work with implementing that. See demoSumprod.m for example. You shoud be able to answer 5.1. 5.2 is a good thought experiment that will force you to think through the issues. In both of these read "factor graph" or "probability distribution" instead of Markov network and it will be fine. Some people asked how big a network we can handle. Well so long as we can do message passing, computation is linear in the number of nodes, so we can go very big. There are web scale applications of inference. Question 5.6 gives a very simple webscale example which is effectively equivalent to the original PageRank algorithm (whoops I've given you the answer!).

In looking at other resources to help with your understanding you may have been frustrated by references to belief/Bayesian networks or Markov Networks which we haven't handled yet. Most courses teach these first and factor graphs later. We have done things the other way round because mixed factor graphs capture all that is needed from both belief networks and Markov networks. So by focusing on factor graphs we only have to remember one rule for everything, rather than rules for each network type (I easily forget things and so I prefer approaches that reduce the amount I have to learn, and maximise the amount I can work out). But don't worry - I'll tell you all about belief networks and Markov networks next, so you know what they are. But in working with them, its best just to convert them to factor graphs.

There are no tutorials next week.

**Lecture 5**

The focus this week will be inference, and also looking at different forms of graphical models. It would be good to work through the chapters in Barber on Belief Networks and the chapter on Graphical Models.

**Friday Lecture Slot**

There is no lecture today.

The second tutorial sheet for week 5 looks at belief networks and Markov Networks.

The answers to the second tutorial sheet are now available..

**Lecture 6**

We will continue looking at different types of networks in this lecture. You should look at the chapter 3 in Barber and attempt the exercises 3.1 to 3.4, and 3.13 (Look up Markov equivalence).You should also work through Section 4.2 in Barber and do questions 4.3, 4.4 and 4.6.

**Lecture 7**

Slides for lecture 7. At this stage we now move on to learning. First we will focus on learning in probababilistic models, by seeing that learning is actually the same as inference, but just at a different level. Reading Sections 9.1 and 9.2 in Barber will help to clarify that. Question 3 on the tutorial sheet pulls this all together and so it will be worth following up on that.

**Lecture 8**

The focus of this lecture was on a question/answer session where we went through previous lectures, and examined potential misunderstanding or issues. We discussed the issues in the D-separation and factor-graph separation, and why conditioning on variables can make things that were previously conditionally independent then become dependent.

There are no tutorials next week (Innovative Learning Week). The third tutorial sheet for week 6 looks at Bayesian Learning in graphical models. The answers to tutorial sheet three are now available.

We looked at the exponential family in this lecture, and introducing the idea of conjugate distributions. Exponential family distributions form the backbone for many exact Bayesian inference methods through the use of conjugacy. See the Wikipedia page on the conjugate prior as a good summary of conjugacy.

This lecture focuses on the use of approximations to Bayesian Learning and Inference, focusing on the relation maximum likelihood and maximimum posterior methods to approximate inference with a delta function posterior distribution. We use Independent Component Analysis as an example. See Chapter 34 of Mackay for Independent component analysis.

The fourth tutorial sheet looks at Gaussian distributions. The answers to tutorial sheet four are now available.

We will continue looking at approximate learning and inference.

Learning and Inference 2 slides Further look at approximate learning methods, such as variational methods and loopy belief propagation. Look at the variational inference section of Barber for this, and chapters 29 to 33 of Mackay. An example exam question and answer for variational methods (and belief propogation) is given at Variational.pdf.

The fifth tutorial sheet looks at Boltzmann Machines. You will need The Mnist Dataset and the code demobmsamp.m. The answers to tutorial sheet five are now available.

Sampling and the Boltzmann Machine. We look at Gibbs sampling, and introduce the Boltzmann Machine and restricted Boltzman machine. The Boltzman machine is not covered in detail in Barber. Chapter 43 of Mackay (available online) covers Boltzmann machines, and you should work through that. For restricted Boltzmann machines, the Deep Learning Tutorial is one resource. You should also look at the Practical Guide to Training RBMs.

Continuing look at Boltzmann Machines. Then a look at Decision Theory. Chapter 36 of Mackay and the exercises therein provide a good introduction to the Decision Theoretic elements.

The sixth tutorial sheet looks at decision theory and hidden Markov models. Answers.

We now look at Hidden Markov Models.

More on hidden Markov models.

The seventh tutorial sheet looks at linear dynamical systems. Answers.

Assignment Feeback

**Lecture 18** More hidden Markov models, and linear dynamical systems.

Last lecture: More on linear dynamical systems.

Notation

Make your own dragon illusion (close one eye).

Answer sheet 4.

Answer sheet 5.

Answer sheet 6 for week 8.

The tutorials this week will focus on questions you have about the assignment. Tutors: please familiarise yourself with the assignment questions and answers above. Most Q are likely to focus on Q3, and tutors may want to focus their preparation on that Q. Though the answers are provided many students will still have uncertainties on how

There will be no lectures next week, but there will be tutorials. See Tutorial Sheet 7.

Answer Sheet 7.

Finish off Undirected graphical models, followed by question and answer session. If there is time I will then discuss Coding and Information Theory (not examinable) slides, slides4up

Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail: school-office@inf.ed.ac.uk Please contact our webadmin with any comments or corrections. Logging and Cookies Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh |