Lecture 17 - Thursday 15.11.12

  Parallelizing Gaussian elimination and back substitution
  in the solution of systems of linear equations.

  ** NB This is the final lecture of the course **

  ** Reading: Grama 8.3 (5.5 in the old edition).


Lecture 16 - Monday 12.11.12

  Cannon's matrix multiplication algorithm. Discussion of 
  exercise sheets 2.

  Reading: as last time.


Lecture 15 - Thursday 8.11.12

  Dense matrix distribution schemes and parallel matrix multiplication
  for PRAM. A simple message passing algorithm.

  ** Reading: Grama 8.2, 8.2.1, 8.2.2 and 8.3.1 (parts of 3.4.1 on distribution
  ** schemes, 5.1, 5.4, 5.4.1, 5.4.2 and 5.5.1 in the old edition)


Lecture 14 - Monday 5.11.12

  Mapping bitonic mergesort to the hypercube (where it fits
  perfectly) and mesh (where it doesn't). Efficient mapping to
  the mesh required us to consider the frequencies with which
  various pairs of processes interact, and to take this
  into account when mapping.

  ** Reading 9.2 of Grama (6.2 in the old edition)


Lecture 13 - Thursday 1.11.12

  Bitonic Mergesort, its internal structure and run-time.
 
  ** Reading 9.2 of Grama (6.2 in the old edition)


Lecture 12 - Monday 29.10.12

  Odd-even transposition sort as a first example of a message
  based sort. Scaling down for cost optimality (compare-exchange
  becomes compare-split). Expression as a sorting network.

  Reading 9.3, 9.3.1 and 9.1, 9.2 (6.3, 6.3.1 and 6.1, 6.2 first edition).


Lecture 11 - Thursday 25.10.12

  A CRCW sorting algorithm inspired by quicksort, with "random" CW
  resolution used to randomise the pivot selection process, as in
  good sequential quicksort.

  Reading: 9.4.2 in Grama (6.4.1 in the old edition). Then, if you want to 
  read more (but this is *not* required reading), 
  section 2.3.4 of Quinn's book, Parallel Computing: Theory
  and Practice,  goes into more detail on the "tree->list"
  construction before the list prefix.

Lecture 10 - Monday 22.10.12

  A PRAM sorting algorithm, based on mergesort,
  borrowing from enumeration sort to parallelise
  the merge step. We also began to look at the
  CRCW quicksort based algorithm.

  ** Suggested reading: The CREW Mergesort is in the supplementary 
  ** note. It isn't in Grama, but you can find a discussion of CREW
  ** merging in Quinn's book Parallel Computing:Theory
  ** and Practice, p40-42. The more complex, but cost
  ** optimal merge mentioned can be read about in S. Akl: Design 
  ** and Analysis of Parallel Algorithms, p66-68 (but this is
  ** NOT required for the course).


Lecture 9 - Thursday 18.10.12

  Implementation of a range of collective communication 
  primitives, emphasising the common structure of 
  ring -> mesh (row then column) -> hypercube (dimension
  by dimension).

  ** Suggested reading:  Chapter 4 up to and 
  ** including 4.5 (which is chapter 3 up to 
  ** and including 3.5 in the old edition).


Lecture 8 - Monday 15.10.12

  Pointer jumping in PRAMs, which
  allows us to traverse apparently linear time
  linked lists in logarithmic time, computing useful
  things (eg ranks and prefixes) as we go.

  ** Suggested reading: for pointer jumping, the rest of 
  ** supplementary note 2.

Lecture 7 - Thursday 11.10.12

  We covered overheads 53-59, on reduction and prefix
  (and their implementation) as useful primitives, and
  application in the fractional knapsack problem.

  ** Suggested reading. Mainly from supplementary note 2, but
  ** also see Grama section 4.3 for the hypercube prefix implementation.
  ** (or 3.3 in the first edition)

Lecture 6 - Monday 8.10.12

  A look at parallel algorithm design techniques in general and
  divide & conquer and pipelining in particular. Also step-by-step
  parallelization and Amdahl's Law. Overheads 42-52.

  ** Suggested reading.   ** Read the second supplementary note, and
  ** also look at Grama chapter 3, up to (and including) 3.2.2.
  ** Although I have not followed this at all closely in lectures, it
  ** says some useful and interesting things, and might help give you
  ** a different persepective on the algorithms we examine later.
  ** This material was new in the second edition. There isn't really
  ** a corresponding section in the first edition, so you might want to
  ** check it out in the library.


Lecture 5 - Thursday 4.10.12

  Scaling down of the hypercube summation algorithm for smaller p. 
  We discovered that the two "obvious" mappings have asymptotically  
  different costs. Mapping meshes to hypercubes as a general purpose 
  approach to porting algorithms, using Gray codes to find an embedding which
  preserves the adjacency properties we want. Overheads 33-41.

  ** Suggested reading: The Gray code mapping is discussed in Grama 2.7.1.


Lecture 4 - Monday 1.10.12

  An introduction to message passing models and their
  abstraction: mesh and hypercube structure.
  Summation on the hypercube.

  Reading: For message passing, mesh & hypercube, all of chapter 1,
           section 2.3 & 2.4.1->2.4.4, 2.5, 2,7
           and 5.1->5.3 from the recommended text book by Grama
           et al. (some of this is for the next lecture too)
           These numbers refer to the second edition. The
           library also still has some copies of the first edition,
           where the roughly corresponding sections are
           chapter 1, sections 2.2-2.5 and 4.1->4.2.


Lecture 3 - Thursday 27.9.12

  We discussed the related issues of cost and scalability.
  Good scalability (down) can be achieved easily, by round-robin 
  scheduling, for cost optimal  algorithms. 
  Brent's Theorem helps us to predict the occasions
  on scaled-down optimality can be achieved with with non cost-optimal
  algorithms (essentially by exploiting the slack in the
  large scale version). We covered overheads 19-21.

  Reading: For Brent, we're still in note 1.
           You can find a proof of Brent's Theorem on
           page 44 of "Parallel Computing: Theory and Practice"
           by MJ Quinn (it's in the library), though this (the proof)
           is not a required part of the course. If you are
           super-keen you could even look at the original
           paper: Brent R.P., The Parallel Evaluation of General
           Arithmetic Expressions, Journal of the ACM 21(2) 201-206,
           which is available on-line through the library's electronic
           journal collection.

Lecture 2 - Monday 24.9.12

  We reviewed the PRAM model and its variants, and noted
  the asymptotic difference in time complexity between
  the CRCW-associative and EREW summation algorithms.
  We considered a constant time CRCW(+) PRAM sorting algorithm.
  We defined the cost optimality.
  We covered overheads 11-18.

  Reading: Same as last time really, we're still in note 1.


Lecture  1 - Thursday 20.9.12

  We covered material from overheads 1-11, including an
  overview of the issues which arise in algorithm design
  (machine models, cost models, asymptotic analysis),
  and the complications which arise when we try to be precise.
  We introduced the PRAM model and its variants.


  Reading: So far I have covered the first three and a half
           pages of my supplementary note 1.
           In the textbook, you might like to read chapter 1
           for background, and browse through
           sections 2.1 -> 2.3 to give you a flavour of why
           things get much more complex with real parallel
           architectures (however, you will not need to learn
           this material in detail, as far as this course is
           concerned). 2.4.1 gives a brief summary of PRAM.
           These numbers refer to the second edition. The
           library also still has some copies of the first edition,
           which contains the same material, but in different
           places (chapter 1 and 2.1-2.2).