Lecture 4, Friday w2, 2014-09-26
=================================

**This lecture was *fast***. I expect most people have to go over the
Source Coding Theorem themselves more slowly---more than once---before they
get it. I know I did. The next few lectures are slower. And there's only
one lecture next week. Make some time to work through the material so far.


**Main lecture points:**

* Information content, $\log(1/p_i) = -\log(p_i)$, as a random variable.
* Entropy: expected information content.
* Binary entropy: special case for a Bernoulli. Function of
  one number, $p$.
* Extended ensemble: block of $N$ i.i.d. draws, each with entropy $H$.
* **Source coding theorem:**
    * Info. content of block is usually near the mean (law of large numbers).
    * *Typical set*: contains all the corresponding blocks we usually see.
    * Size of this set is close to $2^{NH}$ as $N\rightarrow \infty$.
    * Therefore can code block with $NH$ bits, or $H$ bits/symbol.
    * And, as we showed, no less.


Check your progress
-------------------

Things you should be able to do, and questions to think about after after
reviewing the lecture:

* Understand $p(x)$, $\log p(x)$ are random variables. Q1, Tutorial sheet 1.
* Be able to compute the entropy of a discrete distribution. What do you do
  if the probability of an outcome is zero? Why?
* The source coding theorem is for large blocks of $N\rightarrow \infty$
  symbols. If we were only compressing one or a few symbols, could we do
  better than $H$ bits/symbol on average? Might we have to do worse?
* Understand the proof of the source coding theorem well enough that you
  could explain it to another MSc student not taking this course. So they
  could actually follow each step and understand the implications.


Recommended reading
-------------------

We have finished the **'week 2' slides**, except detail I skipped for the
binary entropy function, and the numerical note on log(sum(exp(·))). I will
return to those things I skipped. Ask on NB if anything else is unclear, or
too compressed.

**MacKay Chapter 4** gives more details on the Source Coding Theorem, so I
recommend reading the rest of Chapter 4. MacKay's notation is slightly
different. For example, I indexed the typical set with $m$, the number of
standard deviations away from the mean. MacKay uses $\beta$, which absorbs
the standard deviation of $h(x)$. Also, I didn't use the symbols $H_\delta$
or $S_\delta$. IT exam questions won't require you to recall the detail of
notation from my proof or MacKay's. However, you do need to understand the
ideas behind the proof. You may have to explain parts of the argument, or
answer questions relating to it.