Lecture 17 - Monday 18.11.13 Finished off the Gaussian elimination algorithm and discussed the second exercise sheet. ** NB This is the final lecture of the course ** ** Reading: same as last time. Lecture 16 - Thursday 14.11.13 Cannon's matrix multiplication algorithm. Parallelizing Gaussian elimination in the solution of systems of linear equations. ** Reading: as last time, plus Grama 8.3 (5.5 in the old edition). Lecture 15 - Monday 11.11.13 Dense matrix distribution schemes and parallel matrix multiplication for PRAM. A simple message passing algorithm. ** Reading: Grama 8.2, 8.2.1, 8.2.2 and 8.3.1 (parts of 3.4.1 on distribution ** schemes, 5.1, 5.4, 5.4.1, 5.4.2 and 5.5.1 in the old edition) Lecture 14 - Thursday 7.11.13 Finished off BMS mapping then looked at exercise sheets 1 and 2. Exercise 1 solutions can now be collected from the ITO, level 4 Appleton Tower. Lecture 13 - Monday 4.11.13 Bitonic Mergesort analysis of circuit depth. Mapping BMS to the hypercube (where it fits perfectly) and mesh (where it doesn't). Efficient mapping to the mesh required us to consider the frequencies with which various pairs of processes interact, and to take this into account when mapping. ** Reading 9.2 of Grama (6.2 in the old edition) Lecture 12 - Monday 28.10.13 Block-based version of odd-even transposition sort. Sorting network notation. Bitonic Mergesort (analysis to be done next time). ** Reading 9.2 of Grama (6.2 in the old edition) Lecture 11 - Monday 21.10.13 Finished off the CRCW quicksort, then odd-even transposition sort as a first example of a message based sort. Reading 9.3, 9.3.1 and 9.1, 9.2 (6.3, 6.3.1 and 6.1, 6.2 first edition). Lecture 10 - Thursday 17.10.13 A CRCW sorting algorithm inspired by quicksort, with "random" CW resolution used to randomise the pivot selection process, as in good sequential quicksort. Reading: 9.4.2 in Grama (6.4.1 in the old edition). Then, if you want to read more (but this is *not* required reading), section 2.3.4 of Quinn's book, Parallel Computing: Theory and Practice, goes into more detail on the "tree->list" construction before the list prefix. Lecture 9 - Monday 14.10.13 Finished of the communication algorithms from the previous lecture, thena PRAM sorting algorithm, based on mergesort, borrowing from enumeration sort to parallelise the merge step. ** Suggested reading: The CREW Mergesort is in the supplementary ** note. It isn't in Grama, but you can find a discussion of CREW ** merging in Quinn's book Parallel Computing:Theory ** and Practice, p40-42. The more complex, but cost ** optimal merge mentioned can be read about in S. Akl: Design ** and Analysis of Parallel Algorithms, p66-68 (but this is ** NOT required for the course). Lecture 8 - Thursday 10.10.13 Implementation of a range of collective communication primitives, emphasising the common structure of ring -> mesh (row then column) -> hypercube (dimension by dimension). We also spent some time talking about the first exercise sheet. ** Suggested reading: Chapter 4 up to and ** including 4.5 (which is chapter 3 up to ** and including 3.5 in the old edition). Lecture 7 - Monday 7.10.13 Using prefix in the fractional knapsack problem. Pointer jumping in PRAMs, which allows us to traverse apparently linear time linked lists in logarithmic time, computing useful things (eg ranks and prefixes) as we go. ** Suggested reading: the rest of supplementary note 2. Lecture 6 - Thursday 3.10.13 Pipelining, step-by-step parallelization and Amdahl's Law. Reduction and prefix operations (and their implementation) as useful primitives. ** Suggested reading. Mainly from supplementary note 2, but ** also see Grama section 4.3 for the hypercube prefix implementation. ** (or 3.3 in the first edition) Lecture 5 - Monday 30.9.13 Finished off hypercube summation scale-down material. Mapping meshes to hypercubes as a general purpose approach to porting algorithms, using Gray codes to find an embedding which preserves the adjacency properties we want. Finally, a look at Divide&Conquer as a parallel algorithm design strategy. Overheads 39-49. ** Suggested reading: The Gray code mapping is discussed in Grama 2.7.1. ** For design strategis, read the second supplementary note, and ** also look at Grama chapter 3, up to (and including) 3.2.2. ** Although I have not followed this at all closely in lectures, it ** says some useful and interesting things, and might help give you ** a different persepective on the algorithms we examine later. ** This material was new in the second edition. There isn't really ** a corresponding section in the first edition, so you might want to ** check it out in the library. Lecture 4 - Thursday 26.9.13 An introduction to message passing models and their abstraction: mesh and hypercube structure. Summation on the hypercube. Scaling down of the hypercube summation algorithm for smaller p. We (nearly had time to) discover that the two "obvious" mappings have asymptotically different costs. Reading: For message passing, mesh & hypercube, all of chapter 1, section 2.3 & 2.4.1->2.4.4, 2.5, 2,7 and 5.1->5.3 from the recommended text book by Grama et al. (some of this is for the next lecture too) These numbers refer to the second edition. The library also still has some copies of the first edition, where the roughly corresponding sections are chapter 1, sections 2.2-2.5 and 4.1->4.2. Lecture 3 - Thursday 23.9.13 We discussed the related issues of cost and scalability. Good scalability (down) can be achieved easily, by round-robin scheduling, for cost optimal algorithms. Brent's Theorem helps us to predict the occasions on scaled-down optimality can be achieved with with non cost-optimal algorithms (essentially by exploiting the slack in the large scale version). We briefly began to look at cost models for non-shared memory machines. We covered overheads 19-26. Reading: For Brent, we're still in note 1. You can find a proof of Brent's Theorem on page 44 of "Parallel Computing: Theory and Practice" by MJ Quinn (it's in the library), though this (the proof) is not a required part of the course. If you are super-keen you could even look at the original paper: Brent R.P., The Parallel Evaluation of General Arithmetic Expressions, Journal of the ACM 21(2) 201-206, which is available on-line through the library's electronic journal collection. Lecture 2 - Thursday 19.9.13 We reviewed the PRAM model and its variants, and noted the asymptotic difference in time complexity between the CRCW-associative and EREW summation algorithms. We considered a constant time CRCW(+) PRAM sorting algorithm. We defined the cost optimality and explored simple examples. We covered overheads 11-18. Reading: Same as last time really, we're still in note 1. Lecture 1 - Monday 16.9.13 We covered material from overheads 1-11, including an overview of the issues which arise in algorithm design (machine models, cost models, asymptotic analysis), and the complications which arise when we try to be precise. We introduced the PRAM model and its variants. Reading: So far I have covered the first three and a half pages of my supplementary note 1. In the textbook, you might like to read chapter 1 for background, and browse through sections 2.1 -> 2.3 to give you a flavour of why things get much more complex with real parallel architectures (however, you will not need to learn this material in detail, as far as this course is concerned). 2.4.1 gives a brief summary of PRAM. These numbers refer to the second edition. The library also still has some copies of the first edition, which contains the same material, but in different places (chapter 1 and 2.1-2.2).