Lecture 17 - Monday 18.11.13

  Finished off the Gaussian elimination algorithm and
  discussed the second exercise sheet.

  ** NB This is the final lecture of the course **

  ** Reading: same as last time.


Lecture 16 - Thursday 14.11.13

  Cannon's matrix multiplication algorithm. Parallelizing Gaussian 
  elimination  in the solution of systems of linear equations. 

  ** Reading: as last time, plus Grama 8.3 (5.5 in the old edition).


Lecture 15 - Monday 11.11.13

  Dense matrix distribution schemes and parallel matrix multiplication
  for PRAM. A simple message passing algorithm.

  ** Reading: Grama 8.2, 8.2.1, 8.2.2 and 8.3.1 (parts of 3.4.1 on distribution
  ** schemes, 5.1, 5.4, 5.4.1, 5.4.2 and 5.5.1 in the old edition)


Lecture 14 - Thursday 7.11.13

  Finished off BMS mapping then looked at exercise sheets
  1 and 2. Exercise 1 solutions can now be collected from 
  the ITO, level 4 Appleton Tower.

Lecture 13 - Monday 4.11.13

  Bitonic Mergesort analysis of circuit depth.
  Mapping BMS to the hypercube (where it fits
  perfectly) and mesh (where it doesn't). Efficient mapping to
  the mesh required us to consider the frequencies with which
  various pairs of processes interact, and to take this
  into account when mapping.

  ** Reading 9.2 of Grama (6.2 in the old edition)

Lecture 12 - Monday 28.10.13

  Block-based version of odd-even transposition sort. 
  Sorting network notation. Bitonic Mergesort (analysis
  to be done next time).

  ** Reading 9.2 of Grama (6.2 in the old edition)

Lecture 11 - Monday 21.10.13

  Finished off the CRCW quicksort, then odd-even transposition 
  sort as a first example of a message based sort. 

  Reading 9.3, 9.3.1 and 9.1, 9.2 (6.3, 6.3.1 and 6.1, 6.2 first edition).


Lecture 10 - Thursday 17.10.13

  A CRCW sorting algorithm inspired by quicksort, with "random" CW
  resolution used to randomise the pivot selection process, as in
  good sequential quicksort.

  Reading: 9.4.2 in Grama (6.4.1 in the old edition). Then, if you want to 
  read more (but this is *not* required reading), 
  section 2.3.4 of Quinn's book, Parallel Computing: Theory
  and Practice,  goes into more detail on the "tree->list"
  construction before the list prefix.


Lecture 9 - Monday 14.10.13

  Finished of the communication algorithms from the
  previous lecture, thena PRAM sorting algorithm, 
  based on mergesort, borrowing from enumeration sort 
  to parallelise the merge step. 

  ** Suggested reading: The CREW Mergesort is in the supplementary 
  ** note. It isn't in Grama, but you can find a discussion of CREW
  ** merging in Quinn's book Parallel Computing:Theory
  ** and Practice, p40-42. The more complex, but cost
  ** optimal merge mentioned can be read about in S. Akl: Design 
  ** and Analysis of Parallel Algorithms, p66-68 (but this is
  ** NOT required for the course).


Lecture 8 - Thursday 10.10.13

  Implementation of a range of collective communication 
  primitives, emphasising the common structure of 
  ring -> mesh (row then column) -> hypercube (dimension
  by dimension). We also spent some time talking about the
  first exercise sheet.

  ** Suggested reading:  Chapter 4 up to and 
  ** including 4.5 (which is chapter 3 up to 
  ** and including 3.5 in the old edition).


Lecture 7 - Monday 7.10.13

  Using prefix in the  fractional knapsack problem.
  Pointer jumping in PRAMs, which allows us to traverse 
  apparently linear time linked lists in logarithmic time, 
  computing useful things (eg ranks and prefixes) as we go.

  ** Suggested reading: the rest of supplementary note 2.


Lecture 6 - Thursday 3.10.13

Pipelining, step-by-step parallelization and Amdahl's Law. 
Reduction and prefix operations (and their implementation) as 
useful primitives.

  ** Suggested reading. Mainly from supplementary note 2, but
  ** also see Grama section 4.3 for the hypercube prefix implementation.
  ** (or 3.3 in the first edition)


Lecture 5 - Monday 30.9.13

Finished off hypercube summation scale-down material.
Mapping meshes to hypercubes as a general purpose 
approach to porting algorithms, using Gray codes to find an embedding which
preserves the adjacency properties we want. Finally, a look at 
Divide&Conquer as a parallel algorithm design strategy.
Overheads 39-49.

  ** Suggested reading: The Gray code mapping is discussed in Grama 2.7.1.
  ** For design strategis, read the second supplementary note, and
  ** also look at Grama chapter 3, up to (and including) 3.2.2.
  ** Although I have not followed this at all closely in lectures, it
  ** says some useful and interesting things, and might help give you
  ** a different persepective on the algorithms we examine later.
  ** This material was new in the second edition. There isn't really
  ** a corresponding section in the first edition, so you might want to
  ** check it out in the library.


Lecture 4 - Thursday 26.9.13

  An introduction to message passing models and their
  abstraction: mesh and hypercube structure. Summation on the hypercube.
  Scaling down of the hypercube summation algorithm for smaller p. 
  We (nearly had time to) discover that the two "obvious" mappings have 
  asymptotically different costs. 


  Reading: For message passing, mesh & hypercube, all of chapter 1,
           section 2.3 & 2.4.1->2.4.4, 2.5, 2,7
           and 5.1->5.3 from the recommended text book by Grama
           et al. (some of this is for the next lecture too)
           These numbers refer to the second edition. The
           library also still has some copies of the first edition,
           where the roughly corresponding sections are
           chapter 1, sections 2.2-2.5 and 4.1->4.2.


Lecture 3 - Thursday 23.9.13

  We discussed the related issues of cost and scalability.
  Good scalability (down) can be achieved easily, by round-robin 
  scheduling, for cost optimal  algorithms. 
  Brent's Theorem helps us to predict the occasions
  on scaled-down optimality can be achieved with with non cost-optimal
  algorithms (essentially by exploiting the slack in the
  large scale version). We briefly began to look at cost models
  for non-shared memory machines. We covered overheads 19-26.

  Reading: For Brent, we're still in note 1.
           You can find a proof of Brent's Theorem on
           page 44 of "Parallel Computing: Theory and Practice"
           by MJ Quinn (it's in the library), though this (the proof)
           is not a required part of the course. If you are
           super-keen you could even look at the original
           paper: Brent R.P., The Parallel Evaluation of General
           Arithmetic Expressions, Journal of the ACM 21(2) 201-206,
           which is available on-line through the library's electronic
           journal collection.


Lecture 2 - Thursday 19.9.13

  We reviewed the PRAM model and its variants, and noted
  the asymptotic difference in time complexity between
  the CRCW-associative and EREW summation algorithms.
  We considered a constant time CRCW(+) PRAM sorting algorithm.
  We defined the cost optimality and explored simple examples.
  We covered overheads 11-18.

  Reading: Same as last time really, we're still in note 1.


Lecture  1 - Monday 16.9.13

  We covered material from overheads 1-11, including an
  overview of the issues which arise in algorithm design
  (machine models, cost models, asymptotic analysis),
  and the complications which arise when we try to be precise.
  We introduced the PRAM model and its variants.


  Reading: So far I have covered the first three and a half
           pages of my supplementary note 1.
           In the textbook, you might like to read chapter 1
           for background, and browse through
           sections 2.1 -> 2.3 to give you a flavour of why
           things get much more complex with real parallel
           architectures (however, you will not need to learn
           this material in detail, as far as this course is
           concerned). 2.4.1 gives a brief summary of PRAM.
           These numbers refer to the second edition. The
           library also still has some copies of the first edition,
           which contains the same material, but in different
           places (chapter 1 and 2.1-2.2).