Reading List in Parallel and Distributed Data Management
Required reading list
- Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M. Frans Kaashoek, Frank Dabek, Hari Balakrishnan. Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications. IEEE/ACM Transactions on Networking (TON), 2003.
- Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. 6th Symposium on Operating Systems Design and Implementation (OSDI), 2004.
- Christopher Olston, Benjamin Reedy, Utkarsh Srivastavava, Ravi Kumar, Andrew Tomkins. Pig Latin: A Not-So-Foreign Language for Data Processing. SIGMOD, 2008.
- Leslie Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, Volume 21 , Issue 7 (July 1978), pages 558-565.
Optional reading list
Distributed database design
Vertical fragmentation
- S. Navathe, S. Ceri, G. Wiederhold, J. Dou. Vertical Partitioning Algorithms for Database Design. ACM Transactions on Database Systems (TODS), volume 9, issue 4, 1984.
- B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H. Jacobsen, N. Puz, D. Weaver, R. Yerneni, PNUTS: Yahoo!’s Hosted Data Serving Platform. PVLDB 2008
Query processing and optimization in distributed databases
Privacy-preserving join
- R. Agrawal, A. Evfimievski, R. Srikant. Information Sharing Across Private Databases. ACM SIGMOD International Conference on Management of Data, San Diego, California, 2003.
Data Replication
Adaptive scheme for replicating data
- S. Kadambi, J. Chen, B. Cooper, D. Lomax, R. Ramakrishnan, A. Silberstein, E. Tam, and H. G. Molina. Where in the world is my data? VLDB, 2011.
Paxos
- Jonathan Kirsch and Yair Amir. Paxos for System Builders. CNDS, March 2008.
Zab
- Flavio P. Junqueira, Benjamin C. Reed, and Marco Serafini. Zab: High-performance broadcast for primary-backup systems. DSN-DCCS, 2011.
P2P
Designing a Super-peer Network
- Yang, Beverly and Garcia-Molina, Hector (2003) Designing a Super-peer Network. In: IEEE International Conference on Data Engineering, (ICDE 2003), March 5-8, 2003, Bangalore, India.
Distributed information retrieval
Crawling
-
P. Boldi, B. Codenotti, M. Santini, S. Vigna. UbiCrawler: A Scalable Fully Distributed Web Crawler. Software Practice & Experience 34(8): 711-726.
-
A. Arasu, J. Cho, H.Garcia-Molina, A. Paepcke, S. Raghavan. Searching the Web. ACM Transactions on Internet Technology, Vol. 1, No. 1, August 2001, Pages 2–43.
Caching
- R. Baeza-Yates, A. Gionis, F. Junqueira, V. Murdock, V. Plachouras, F. Silvestri. The Impact of Caching on Search Engines. ACM SIGIR International Conference on Information Retrieval, Amsterdam, The Netherlands, 2007.
Open Source Systems
S4
- Leonardo Neumeyer, Bruce Robbins, Anish Nair, Anand Kesari. S4: Distributed Stream Computing Platform. 2010 IEEE International Conference on Data Mining Workshops.
Hyracks
- Vinayak Borkar, Michael Carey, Raman Grover, Nicola Onose, Rares Vernica. Hyracks: A Flexible and Extensible Foundation for Data-Intensive Computing. ICDE, 2011.
BigTable
- FAY CHANG, JEFFREY DEAN, SANJAY GHEMAWAT, WILSON C. HSIEH, DEBORAH A. WALLACH, MIKE BURROWS, TUSHAR CHANDRA, ANDREW FIKES, and ROBERT E. GRUBER. Bigtable: A Distributed Storage System for Structured Data. OSDI, 2006.
Pregel
- Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: A System for Large-Scale Graph Processing. SIGMOD, 2010.