# Programming notes for Information theory

This page is for code snippets and programming notes that may be helpful for the Informatics Information Theory course.

## Huffman codes

Minimal implementations sufficient to help answer the tutorial questions.

Matlab/Octave implementation: huffman.m
Python implementation: huffman.py

## Arbitrary symbol codes

Theo Varvadoukas, a student in the 2012–13 class, coded up the Sardinas–Patterson algorithm in Python: sardinas_patterson.py.

This code allows you to quickly check if a symbol code is uniquely decodable, whether or not it is a prefix code. The Kraft–McMillan inequality can rule out a code, but this algorithm actually checks it. An alternative to checking non-prefix codes is to only use prefix codes: if the lengths of the codewords satisfy the Kraft–McMillan inequality, then why not construct a prefix code with those lengths?

## log-Gamma function

The log-Gamma function can be useful for evaluating the log of large factorials (ln(x!)=gammaln(x+1)). For many purposes in this course you can approximate the factorials you encounter, which may give more insight than evaluating them. However, comparing to a numerically ‘exact’ answer can be useful too. Here are some implementations that I’m aware of:

C: lgamma is provided in math.h in C99 and POSIX.1-2001 versions of C.
C: or use gammaln in util.c in lightspeed (older version with more liberal license)
C: or use logfactorial / logGamma in random.c in BayeSys
Matlab/Octave: gammaln
Python+numpy+scipy: scipy.special.gammaln
R: lgamma

There is probably a function somewhere for your favourite programming language. If not, you could port or call one of the C implementations.

If you need to compute log gamma differences of very close numbers, see this blog post.

## Random numbers

Most programming languages come with a uniform random number generator that provides samples uniformly between 0 and 1. Uniform real-valued samples are often used as a basis for sampling from non-uniform distributions, although MacKay (p118) argues for a using raw bits in an arithmetic encoder instead.

C: The C standard library function rand provides a sampler for integers over a large range. The following two lines provide a u01rand ‘function’ for approximately generating Uniform[0,1] samples:

```#include <stdlib.h>
#define u01rand() ((double)rand()/RAND_MAX)```

There are known problems with the standard C generator though, and it should be avoided in serious scientific work. Really, I have actually been bitten by the C random number generator, it is dangerous! A better generator and several standard distributions are provided by the random.{h,c} files in BayeSys, which you could drop into your project. The GSL also provides random number routines, amongst many other things.

Other languages (recommended for exploratory code): Matlab provides rand and randn for uniform and Gaussian variates. Its statistics toolbox, for which you might not have a license, provides various other generators. If you use Octave (or R, or scipy, ...) instead, many of the same functions are available for free.

## Plotting

Being able to quickly draw graphs is an important skill that can help with the understanding of new topics. Many languages provide convenient plotting functionality, for example, Matlab/Octave, R and Python+matplotlib. If your favourite programming language does not, an alternative is to write out your data to text files and plot it using a package like gnuplot.

## Demos

Central limit theorem: a question in tutorial 1 was to add up 12 uniform[0,1] numbers many times, plot a histogram of the results and compare to a central limit theorem prediction. Here are example solutions in: a) Matlab/Octave, b) C with plotting done by gnuplot.