Proposer: Mary Cryan, 0131 6505153, mcryan AT inf DOT ed DOT ac DOT uk
Self-Proposed: No
Supervisor: Mary Cryan, 0131 6505153, mcryan AT inf DOT ed DOT ac DOT uk
Other Suggested Supervisors: Mark jerrum
Subject Areas: Algorithm Design,
Suitable for the following degrees: MSc in Informatics, MSc in Computer Science,
Principal goal of the project: To implement and test an elegant and clever Dynamic programming algorithm for sampling and counting Knapsack solutions (and for counting/sampling for related sets). Also to hopefully improve both the algorithm and its analysis.
Description of the project:
The project concerns the use of Dynamic Programming in an unusual setting - being used as a technique to sample from and/or count the number of solutions to certain combinatorial problems. The knapsack problem is very well-known in computer science as an optimization problem:
Given a integer bound B (representing the amount of volume in our knapsack), and a list of integers s1, ...., sn representing the sizes of objects that we would like to pack, a FEASIBLE packing is a set of objects whose total volume is at most B.
The OPTIMIZATION problem is to construct a feasible packing which occupies the maximum amount of space in the knapsack. The COUNTING problem is the problem of counting the number of feasible solutions for the input (there are potentially as many as 2^n packings, or as few as 1 packing (the empty packing), depending on the input). Both problems have been well-studied in complexity theory. The COUNTING problem is closely related to the SAMPLING problem (the problem of choosing a feasible solution randomly from among all the feasible solutions).
It is strongly believed that, in general, the problem of COUNTING the number of solutions cannot be exactly computed in polynomial-time. However there has been progress on developing polynomial-time algorithms which closely approximate the number of feasible solutions. Many algorithms have been developed for subcases of the problem, all based on the Markov-chain Monte Carlo technique. However, last year Martin Dyer developed a very simple and elegant Dynamic Programming Algorithm for counting knapsack solutions (and solutions of related problems). This algorithm is very elegant and yet has a substantially better running time than the other algorithms developed beforehand.
The idea of Dyer's algorithm is simple - First the input I is mapped to a new "rounded" instance I' of knapsack, with the following three properties:
(a) Every feasible solution to I is also a feasible solution to I'.
(b) The number of feasible solutions to I' is at most n times more than the number of solutions to I.
(c) *** The number of solutions to the "rounded" instance I' can be computed exactly in polynomial-time. ***
By implementing a straightforward Dynamic Programming algorithm, we can exactly compute the number of solutions to the "rounded" instance. This gives a weak approximation to the number of solutions for our original input, with approximation factor n. However - we can also use dynamic programming to choose a random sample uniformly at random from I'. Therefore, by taking a few samples, we can closly estimate the fraction of samples from I', which also belong to I. This technique is known as "dart-throwing", where we think of the large set of solutions (to I') as the dartboard, and the small set of solutions (for I) as being an inner circle of the dartboard. Throwing darts randomly at the board allows us to get a closer-and-closer estimate of the number of feasible solutions for our original input I. The same technique can be generalized for counting solutions to the multi-dimensional knapsack problem, and to count the number of CONTINGENCY TABLES.
The first task of this project is to implement and test the Dynamic Programming algorithms of Dyer for Knapsack solutions. I hope that the student who takes the project will also attempt to implement the algorithm for some cases of the multidimensional version of knapsack - maybe in two dimensions. This project will involve programming and the use of a pseudorandom generator for the "dart-throwing" part of the project. Another interesting goal is to see whether we can show something better than n in (b) above. The student could also implement and test the related algorithms for contingency tables. A very challenging goal is to consider the possiblity of derandomizing the counting algorithm, while allowing us to approximate the number of solutions to as close as we want.
The reference below might look a little intimidating on a first reading. But the idea behind it is very nice and simple and can be explained with less notation if you come and talk to me.
Resources Required: Nothing out of the ordinary
Degree of Difficulty: medium -> as hard as you want (lots of scope)
Background Needed: Programming skills, some mathematical confidence
References: