Stop Thinking, Just Do!

Sung-Soo Kim's Blog

Papers in Parallel and Distributed Data Management


6 January 2016

Reading List in Parallel and Distributed Data Management

Required reading list

  1. Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M. Frans Kaashoek, Frank Dabek, Hari Balakrishnan. Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications. IEEE/ACM Transactions on Networking (TON), 2003.
  2. Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. 6th Symposium on Operating Systems Design and Implementation (OSDI), 2004.
  3. Christopher Olston, Benjamin Reedy, Utkarsh Srivastavava, Ravi Kumar, Andrew Tomkins. Pig Latin: A Not-So-Foreign Language for Data Processing. SIGMOD, 2008.
  4. Leslie Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, Volume 21 , Issue 7 (July 1978), pages 558-565.

Optional reading list

Distributed database design

Vertical fragmentation

Query processing and optimization in distributed databases

Privacy-preserving join

Data Replication

Adaptive scheme for replicating data




Designing a Super-peer Network

  • Yang, Beverly and Garcia-Molina, Hector (2003) Designing a Super-peer Network. In: IEEE International Conference on Data Engineering, (ICDE 2003), March 5-8, 2003, Bangalore, India.

Distributed information retrieval


  • P. Boldi, B. Codenotti, M. Santini, S. Vigna. UbiCrawler: A Scalable Fully Distributed Web Crawler. Software Practice & Experience 34(8): 711-726.

  • A. Arasu, J. Cho, H.Garcia-Molina, A. Paepcke, S. Raghavan. Searching the Web. ACM Transactions on Internet Technology, Vol. 1, No. 1, August 2001, Pages 2–43.


  • R. Baeza-Yates, A. Gionis, F. Junqueira, V. Murdock, V. Plachouras, F. Silvestri. The Impact of Caching on Search Engines. ACM SIGIR International Conference on Information Retrieval, Amsterdam, The Netherlands, 2007.

Open Source Systems





comments powered by Disqus