Lecture 8, Tuesday w5, 2014-10-14 ================================= Things we covered: * Motivation for stream codes: optimal predictions incorporate learning and knowledge of any dependencies in the source. * Product rule, and its generalization to many variables, sometimes called the chain rule. * Arithmetic coding: * Imagine all possible terminated strings in a binary tree, where nodes correspond to some interval on the number line [0,1). * The interval for a particular string can be identified by consulting a series of predictive distributions for each symbol in turn, given the previous. Linear computation in length of string. * Binary strings of length $l$ also define intervals on the number-line, with heights $2^{-l}$. * Encoding: find a binary string whose interval is unambiguously inside the interval for the target string. * Decoding: find the unique string that entirely surrounds the given binary interval. I forgot to mention the material covered at the top of p111 of MacKay, which I normally explain in class (Sorry!). It's a way of turning the guessing game we played into a compression system, if the guesses are made by a reproducible machine instead of an audience. Non-examinable, but a way of understanding how sequential guesses could be useful, which you may find interesting. The slides also reference some of the history of such guessing games. Shannon used such games to estimate the entropy of the English language (possibly as low as 1 bit / character!). Check your progress ------------------- Things to try to work out after the lecture: * Imagine encoding a pair of symbols with dependency described by some joint distribution, $P(x_1, x_2)$. If we encoded the symbols independently, implicitly assuming $P(x_1, x_2) = P(x_1) P(x_2)$, what would be the penalty? (We covered something last time on how to answer this question.) * Does arithmetic coding define an instantaneous code? * Does arithmetic coding define a complete code? * In lectures, I skimmed over the details of how an interval was defined, and precisely what was required for a binary string to encode an interval defined by the model. The book (see reading, below) is more careful, representing an interval from the model like this: $[a,b)$. Why is that necessary, and how precisely are the intervals matched together? Recommended reading ------------------- We've nearly finished the `week 4' slides. I'll probably go over bits I missed as part of a quick recap next time. Otherwise they may be self-explanatory(?). Read **MacKay pp110--116**.