Programming notes for Information theory

This page is for code snippets and programming notes that may be helpful for the Informatics Information Theory course.

Huffman codes

Minimal implementations sufficient to help answer the tutorial questions.

Matlab/Octave implementation: huffman.m
Python implementation: huffman.py
Haskell implementation: huffman.hs

Arbitrary symbol codes

Theo Varvadoukas, a student in the 2012–13 class, coded up the Sardinas–Patterson algorithm in Python: sardinas_patterson.py.

This code allows you to quickly check if a symbol code is uniquely decodable, whether or not it is a prefix code. The Kraft–McMillan inequality can rule out a code, but this algorithm actually checks it. An alternative to checking non-prefix codes is to only use prefix codes: if the lengths of the codewords satisfy the Kraft–McMillan inequality, then why not construct a prefix code with those lengths?

log-Gamma function

The log-Gamma function can be useful for evaluating the log of large factorials (ln(x!)=gammaln(x+1)). For many purposes in this course you can approximate the factorials you encounter, which may give more insight than evaluating them. However, comparing to a numerically ‘exact’ answer can be useful too. Here are some implementations that I’m aware of:

C: lgamma is provided in math.h in C99 and POSIX.1-2001 versions of C.
C: or use gammaln in util.c in lightspeed (older version with more liberal license)
C: or use logfactorial / logGamma in random.c in BayeSys
Matlab/Octave: gammaln
Python+numpy+scipy: scipy.special.gammaln
R: lgamma

There is probably a function somewhere for your favourite programming language. If not, you could port or call one of the C implementations.

If you need to compute log gamma differences of very close numbers, see this blog post.

Random numbers

Most programming languages come with a uniform random number generator that provides samples uniformly between 0 and 1. Uniform real-valued samples are often used as a basis for sampling from non-uniform distributions, although MacKay (p118) argues for a using raw bits in an arithmetic encoder instead.

C: The C standard library function rand provides a sampler for integers over a large range. The following two lines provide a u01rand ‘function’ for approximately generating Uniform[0,1] samples:

#include <stdlib.h>
#define u01rand() ((double)rand()/RAND_MAX)

There are known problems with the standard C generator though, and it should be avoided in serious scientific work. Really, I have actually been bitten by the C random number generator, it is dangerous! A better generator and several standard distributions are provided by the random.{h,c} files in BayeSys, which you could drop into your project. The GSL also provides random number routines, amongst many other things.

Other languages (recommended for exploratory code): Matlab provides rand and randn for uniform and Gaussian variates. Its statistics toolbox, for which you might not have a license, provides various other generators. If you use Octave (or R, or scipy, ...) instead, many of the same functions are available for free.

Plotting

Being able to quickly draw graphs is an important skill that can help with the understanding of new topics. Many languages provide convenient plotting functionality, for example, Matlab/Octave, R and Python+matplotlib. If your favourite programming language does not, an alternative is to write out your data to text files and plot it using a package like gnuplot.

Demos

Central limit theorem: a question in tutorial 1 was to add up 12 uniform[0,1] numbers many times, plot a histogram of the results and compare to a central limit theorem prediction. Here are example solutions in: a) Matlab/Octave, b) C with plotting done by gnuplot.


This page maintained by Iain Murray.
Last updated: 2013/10/26 12:50:41


Home : Teaching : Courses : It 

Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail: school-office@inf.ed.ac.uk
Please contact our webadmin with any comments or corrections. Logging and Cookies
Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh