Both books are primarily about MapReduce and relevant to the first five lectures, as well as the last two lectures.
- Hadoop: The Definitive Guide. Tom White. O'Reilly Media.
- Data-Intensive Text Processing with MapReduce (Synthesis Lectures on
Human Language Technologies) Jimmy Linn , Chris Dyer. Morgan & Claypool Publishers
Both of these books should be in the library. They can also be ordered from Amazon.
The papers are grouped by topic in the same order as lectures.
- Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters , OSDI'04: Sixth Symposium on Operating System Design and Implementation,
- M. Stonebraker, D. Abadi, D. J. DeWitt, S. Madden, E. Paulson, A. Pavlo, and A. Rasin, MapReduce and Parallel DBMSs: Friends or Foes?, Communications of the ACM, vol. 53, iss. 1, pp. 64-71, 2010.
- C. Olston, B. Reed, U. Srivastava, R. Kumar, A. Tomkins, Pig Latin: A Not-So-Foreign Language for Data Processing, SIGMOD 2008.
File Systems and Storage
- S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google File System, 19th ACM Symposium on Operating Systems Principles, 2003.
- J Howard, M Kazar, S. Menees, D. Nichols, M. Satyanarayanan, R. Sidebotham, M. West. Scale and Performance in a Distributed File System, ACM Transactions on Computer Systems, Vol. 6, No. 1, February 1988, Pages 51-81.
Distributed Hash Tables
- Armbrust et al, Above the Clouds: A Berkeley View of Cloud Computing. February 12, 2009
- P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, A. Warfield. Xen and the Art of Virtualization, SOSP 2003.
Querying and Databases
- Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. Bigtable: A Distributed Storage System for Structured Data, OSDI'06: Seventh Symposium on Operating System Design and Implementation, Seattle, WA, November, 2006. San Francisco, CA, December, 2004.
- S. Viglas, J. Naughton. Rate-based Query Optimization for Streaming Information Sources, SIGMOD 2002.
- R. Sumbaly, J. Kreps, L. Gao, A, Feinberg, C. Soman, S. Shah. Serving Large-scale Batch Computed Data with Project Voldemort, FAST, 2012.
- Robert Morris. Counting large numbers of events in small registers. Communications of the ACM, Volume 21, Issue 10 (October 1978). Pages: 840 - 842
- J. Kang, J. Naughton, S. Viglas. Evaluating Window Joins over Unbounded Streams, ICDE 2003.
- G. S. Manku, R. Motwani. Approximate Frequency Counts over Data Streams, VLDB, 2002.
- L. Golab, M. T. Ozsu. Issues in Data Stream Management, SIGMOD Record, 32(2), 2003.
- M. Gaber, A. Zaslavsky, S.i Krishnaswamy. Mining Data Streams: A Review, SIGMOD Record, 34(2), 2005.
|Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail:
Please contact our webadmin with
any comments or corrections. Logging and Cookies
Unless explicitly stated otherwise, all material is copyright ©
The University of Edinburgh