THIS IS LAST YEAR'S LECTURE LOG, PROVIDED IN ORDER TO HELP YOU PLAN AND READ AHEAD. THIS YEAR'S COURSE WILL FOLLOW THE SAME PATTERN. THE PRECISE SCHEDULE WILL VARY A LITTLE. Lecture 17 - Thursday 18.11.10 Parallelizing Gaussian elimination and back substitution in the solution of systems of linear equations. ** NB This is the final lecture of the course ** ** Reading: Grama 8.3 (5.5 in the old edition). Lecture 16 - Monday 15.11.10 Dense matrix distribution schemes and parallel matrix multiplication for PRAM. A simple message passing algorithm, then a sketch of Cannon's algorithm. ** Reading: Grama 8.2, 8.2.1, 8.2.2 and 8.3.1 (parts of 3.4.1 on distribution ** schemes, 5.1, 5.4, 5.4.1, 5.4.2 and 5.5.1 in the old edition) Lecture 15 - Thursday 11.11.10 Mapping bitonic mergesort to the hypercube (where it fits perfectly) and mesh (where it doesn't). Efficient mapping to the mesh required us to consider the frequencies with which various pairs of processes interact, and to take this into account when mapping. ** Reading 9.2 of Grama (6.2 in the old edition) Lecture 14 - Monday 8.11.10 Feedback session on exercise sheet 1, and a look at exercise sheet 2. Lecture 13 - Thursday 4.11.10 Bitonic Mergesort, its internal structure and run-time. ** Reading 9.2 of Grama (6.2 in the old edition) Lecture 12 - Monday 1.11.10 Odd-even transposition sort as a first example of a message based sort. Scaling down for cost optimality (compare-exchange becomes compare-split). Expression as a sorting network. Reading 9.3, 9.3.1 and 9.1, 9.2 (6.3, 6.3.1 and 6.1, 6.2 first edition). Lecture 11 - Thursday 28.11.10 A CRCW sorting algorithm inspired by quicksort, with "random" CW resolution used to randomise the pivot selection process, as in good sequential quicksort. Reading: 9.4.2 in Grama (6.4.1 in the old edition). Then, if you want to read more (but this is *not* required reading), section 2.3.4 of Quinn's book, Parallel Computing: Theory and Practice, goes into more detail on the "tree->list" construction before the list prefix. Lecture 10 - Monday 25.10.10 A question and answer session on the first exercise sheet. Lecture 9 - Thursday 21.10.10 A PRAM sorting algorithm, based on mergesort, borrowing from enumeration sort to parallelise the merge step. ** The CREW Mergesort is in the supplementary note. It ** isn't in Grama, but you can find a discussion of CREW ** merging in Quinn's book Parallel Computing:Theory ** and Practice, p40-42. The more complex, but cost ** optimal merge mentioned (but not examinable!) can ** be read about in S. Akl: Design and Analysis of Parallel ** Algorithms, p66-68. Lecture 8 - Monday 18.10.10 Pointer jumping in PRAMs, which allows us to traverse apparently linear time linked lists in logarithmic time, computing useful things (eg ranks and prefixes) as we go. Implementation of a range of collective communication primitives, emphasising the common structure of ring -> mesh (row then column) -> hypercube (dimension by dimension). ** Suggested reading: for pointer jumping, the rest of ** supplementary note 2. For comms primitives, ** Chapter 4 up to and including 4.5 (which is ** chapter 3 up to and including 3.5 in the old edition). Lecture 7 - Thursday 14.10.10 We covered overheads 51-57, reduction and prefix (and their implementation) as useful primitives. We applyied prefix in a step-by-step parallelisation of a sequential "fractional knapsack" algorithm. ** Suggested reading. Mainly from supplementary note 2, but ** also see Grama section 4.3 for the hypercube prefix implementation. ** (or 3.3 in the first edition) Lecture 6 - Monday 11.10.10 A look at parallel algorithm design techniques in general and divide & conquer and pipelining in particular. Overheads 42-50. Also a brief introduction to the first exercise sheet. ** Suggested reading. ** Read the second supplementary note, and ** also look at Grama chapter 3, up to (and including) 3.2.2. ** Although I have not followed this at all closely in lectures, it ** says some useful and interesting things, and might help give you ** a different persepective on the algorithms we examine later. ** This material was new in the second edition. There isn't really ** a corresponding section in the first edition, so you might want to ** check it out in the library. Lecture 5 - Thursday 7.10.10 scaling down of the hypercube summation algorithm for smaller p. We discovered that the two "obvious" mappings have asymptotically different costs. Mapping meshes to hypercubes as a general purpose approach to porting algorithms, using Gray codes to find an embedding which preserves the adjacency properties we want. Overheads 33-41. ** Suggested reading: The Gray code mapping is discussed in Grama 2.7.1. Lecture 4 - Monday 4.10.10 An introduction to message passing models and their abstraction: mesh and hypercube structure. Summation on the hypercube. Scaling and quotient networks. Reading: For message passing, mesh & hypercube, all of chapter 1, section 2.3 & 2.4.1->2.4.4, 2.5, 2,7 and 5.1->5.3 from the recommended text book by Grama et al. (some of this is for the next lecture too) These numbers refer to the second edition. The library also still has some copies of the first edition, where the roughly corresponding sections are chapter 1, sections 2.2-2.5 and 4.1->4.2. Lecture 3 - Thursday 30.9.10 We discussed the related issues of cost and scalability. Good scalability (down) can be achieved easily, by round-robin scheduling, for cost optimal algorithms. Brent's Theorem helps us to predict the occasions on scaled-down optimality can be achieved with with non cost-optimal algorithms (essentially by exploiting the slack in the large scale version). We covered overheads 19-21. Reading: For Brent, we're still in note 1. You can find a proof of Brent's Theorem on page 44 of "Parallel Computing: Theory and Practice" by MJ Quinn (it's in the library), though this (the proof) is not a required part of the course. If you are super-keen you could even look at the original paper: Brent R.P., The Parallel Evaluation of General Arithmetic Expressions, Journal of the ACM 21(2) 201-206, which is available on-line through the library's electronic journal collection. Lecture 2 - Monday 2.9.10 We reviewed the PRAM model and its variants, and noted the asymptotic difference in time complexity between the CRCW-associative and EREW summation algorithms. We considered a constant time CRCW(+) PRAM sorting algorithm. We discussed the related issues of cost and scalability. We covered overheads 11-18. Reading: Same as last time really, we're still in note 1. Lecture 1 - Thursday 23.9.10 We covered material from overheads 1-10, including an overview of the issues which arise in algorithm design (machine models, cost models, asymptotic analysis), and the complications which arise when we try to be precise. We introduced the PRAM model and its variants. Reading: So far I have covered the first three and a half pages of my supplementary note 1. In the textbook, you might like to read chapter 1 for background, and browse through sections 2.1 -> 2.3 to give you a flavour of why things get much more complex with real parallel architectures (however, you will not need to learn this material in detail, as far as this course is concerned). 2.4.1 gives a brief summary of PRAM. These numbers refer to the second edition. The library also still has some copies of the first edition, which contains the same material, but in different places (chapter 1 and 2.1-2.2).