Lecture 10, Tuesday w6, 2014-10-21
==================================

Things we covered:

* Predicting outcomes with a Beta-Binomial model and a
  Dirichlet-Multinomial model.
* Effects of the "pseudo-counts" in these models.
* Alternative to adaptive model: fit parameters to whole file first,
  and encode in header. You'll see the trade-offs for yourself in the
  assignment.
* Can have separate set of counts for each 'context' we predict in. For example,
  each possible setting of a small window of pixels.

Next time we'll carry on talking about making predictions in a context,
combining the predictions from contexts of different sizes.


Check your progress
-------------------

Do you think the Dirichlet parameters for something like characters or words
from language should be large or small? Why?

Explain how setting pseudo-counts to zero in the Beta-Binomial and
Dirichlet-Multinomial models would break an arithmetic coding scheme.


Recommended reading
-------------------

We've now done the 'week 5' slides except PPM which I'll cover next time. Mark
anything that's unclear or that needs expanding on NB.


Extra reading
-------------

If keen, you could read Section 28.3, pp351--353 of MacKay. This section
discusses 'two-part codes', which send parameters then an encoding in more
detail. The 'bits back' method is an ingenious way of getting around the
inefficiency of 'sending the parameters twice'. Any extra details in these
pages (not mentioned in lectures) are all non-examinable.