Lecture 17 - Thursday 15.11.12 Parallelizing Gaussian elimination and back substitution in the solution of systems of linear equations. ** NB This is the final lecture of the course ** ** Reading: Grama 8.3 (5.5 in the old edition). Lecture 16 - Monday 12.11.12 Cannon's matrix multiplication algorithm. Discussion of exercise sheets 2. Reading: as last time. Lecture 15 - Thursday 8.11.12 Dense matrix distribution schemes and parallel matrix multiplication for PRAM. A simple message passing algorithm. ** Reading: Grama 8.2, 8.2.1, 8.2.2 and 8.3.1 (parts of 3.4.1 on distribution ** schemes, 5.1, 5.4, 5.4.1, 5.4.2 and 5.5.1 in the old edition) Lecture 14 - Monday 5.11.12 Mapping bitonic mergesort to the hypercube (where it fits perfectly) and mesh (where it doesn't). Efficient mapping to the mesh required us to consider the frequencies with which various pairs of processes interact, and to take this into account when mapping. ** Reading 9.2 of Grama (6.2 in the old edition) Lecture 13 - Thursday 1.11.12 Bitonic Mergesort, its internal structure and run-time. ** Reading 9.2 of Grama (6.2 in the old edition) Lecture 12 - Monday 29.10.12 Odd-even transposition sort as a first example of a message based sort. Scaling down for cost optimality (compare-exchange becomes compare-split). Expression as a sorting network. Reading 9.3, 9.3.1 and 9.1, 9.2 (6.3, 6.3.1 and 6.1, 6.2 first edition). Lecture 11 - Thursday 25.10.12 A CRCW sorting algorithm inspired by quicksort, with "random" CW resolution used to randomise the pivot selection process, as in good sequential quicksort. Reading: 9.4.2 in Grama (6.4.1 in the old edition). Then, if you want to read more (but this is *not* required reading), section 2.3.4 of Quinn's book, Parallel Computing: Theory and Practice, goes into more detail on the "tree->list" construction before the list prefix. Lecture 10 - Monday 22.10.12 A PRAM sorting algorithm, based on mergesort, borrowing from enumeration sort to parallelise the merge step. We also began to look at the CRCW quicksort based algorithm. ** Suggested reading: The CREW Mergesort is in the supplementary ** note. It isn't in Grama, but you can find a discussion of CREW ** merging in Quinn's book Parallel Computing:Theory ** and Practice, p40-42. The more complex, but cost ** optimal merge mentioned can be read about in S. Akl: Design ** and Analysis of Parallel Algorithms, p66-68 (but this is ** NOT required for the course). Lecture 9 - Thursday 18.10.12 Implementation of a range of collective communication primitives, emphasising the common structure of ring -> mesh (row then column) -> hypercube (dimension by dimension). ** Suggested reading: Chapter 4 up to and ** including 4.5 (which is chapter 3 up to ** and including 3.5 in the old edition). Lecture 8 - Monday 15.10.12 Pointer jumping in PRAMs, which allows us to traverse apparently linear time linked lists in logarithmic time, computing useful things (eg ranks and prefixes) as we go. ** Suggested reading: for pointer jumping, the rest of ** supplementary note 2. Lecture 7 - Thursday 11.10.12 We covered overheads 53-59, on reduction and prefix (and their implementation) as useful primitives, and application in the fractional knapsack problem. ** Suggested reading. Mainly from supplementary note 2, but ** also see Grama section 4.3 for the hypercube prefix implementation. ** (or 3.3 in the first edition) Lecture 6 - Monday 8.10.12 A look at parallel algorithm design techniques in general and divide & conquer and pipelining in particular. Also step-by-step parallelization and Amdahl's Law. Overheads 42-52. ** Suggested reading. ** Read the second supplementary note, and ** also look at Grama chapter 3, up to (and including) 3.2.2. ** Although I have not followed this at all closely in lectures, it ** says some useful and interesting things, and might help give you ** a different persepective on the algorithms we examine later. ** This material was new in the second edition. There isn't really ** a corresponding section in the first edition, so you might want to ** check it out in the library. Lecture 5 - Thursday 4.10.12 Scaling down of the hypercube summation algorithm for smaller p. We discovered that the two "obvious" mappings have asymptotically different costs. Mapping meshes to hypercubes as a general purpose approach to porting algorithms, using Gray codes to find an embedding which preserves the adjacency properties we want. Overheads 33-41. ** Suggested reading: The Gray code mapping is discussed in Grama 2.7.1. Lecture 4 - Monday 1.10.12 An introduction to message passing models and their abstraction: mesh and hypercube structure. Summation on the hypercube. Reading: For message passing, mesh & hypercube, all of chapter 1, section 2.3 & 2.4.1->2.4.4, 2.5, 2,7 and 5.1->5.3 from the recommended text book by Grama et al. (some of this is for the next lecture too) These numbers refer to the second edition. The library also still has some copies of the first edition, where the roughly corresponding sections are chapter 1, sections 2.2-2.5 and 4.1->4.2. Lecture 3 - Thursday 27.9.12 We discussed the related issues of cost and scalability. Good scalability (down) can be achieved easily, by round-robin scheduling, for cost optimal algorithms. Brent's Theorem helps us to predict the occasions on scaled-down optimality can be achieved with with non cost-optimal algorithms (essentially by exploiting the slack in the large scale version). We covered overheads 19-21. Reading: For Brent, we're still in note 1. You can find a proof of Brent's Theorem on page 44 of "Parallel Computing: Theory and Practice" by MJ Quinn (it's in the library), though this (the proof) is not a required part of the course. If you are super-keen you could even look at the original paper: Brent R.P., The Parallel Evaluation of General Arithmetic Expressions, Journal of the ACM 21(2) 201-206, which is available on-line through the library's electronic journal collection. Lecture 2 - Monday 24.9.12 We reviewed the PRAM model and its variants, and noted the asymptotic difference in time complexity between the CRCW-associative and EREW summation algorithms. We considered a constant time CRCW(+) PRAM sorting algorithm. We defined the cost optimality. We covered overheads 11-18. Reading: Same as last time really, we're still in note 1. Lecture 1 - Thursday 20.9.12 We covered material from overheads 1-11, including an overview of the issues which arise in algorithm design (machine models, cost models, asymptotic analysis), and the complications which arise when we try to be precise. We introduced the PRAM model and its variants. Reading: So far I have covered the first three and a half pages of my supplementary note 1. In the textbook, you might like to read chapter 1 for background, and browse through sections 2.1 -> 2.3 to give you a flavour of why things get much more complex with real parallel architectures (however, you will not need to learn this material in detail, as far as this course is concerned). 2.4.1 gives a brief summary of PRAM. These numbers refer to the second edition. The library also still has some copies of the first edition, which contains the same material, but in different places (chapter 1 and 2.1-2.2).