Stop Thinking, Just Do!

Sung-Soo Kim's Blog

Database Reading List


13 November 2015

Database Reading List

History of Database Systems + Distributed Databases

  1. M. Stonebraker, et al., What Goes Around Comes Around, Readings in Database Systems, 4th Edition, 2006
  2. D. DeWitt, et al., Parallel Database Systems: The Future Of High Performance Database Systems, Communications of the ACM, 1992
  3. A. Halevy, et al., The Unreasonable Effectiveness Of Data, IEEE Intelligent Systems, 2009
  4. M. Stonebraker, et al., Intel “Big Data” Science And Technology Center Vision And Execution Plan, SIGMOD Record, 2013

Distributed Transactions

  1. P.A. Bernstein, et al., Concurrency Control In Distributed Database Systems, ACM Comput. Surv., 1981
  2. G. Samaras, et al., Two-Phase Commit Optimizations And Tradeoffs In The Commercial Environment, ICDE, 1993
  3. C. Mohan, et al., Transaction Management In The R* Distributed Database Management System, TODS, 1986
  4. B. Lampson, et al., A New Presumed Commit Optimization For Two Phase Commit, VLDB, 1992
  5. P. Helland, Life Beyond Distributed Transactions: An Apostate’s Opinion, CIDR, 2007

Consensus Protocols

  1. L. Lamport, Paxos Made Simple, ACM SIGACT News, 2001
  2. T. Chandra, et al., Paxos Made Live, PODC, 2007
  3. L. Lamport, The Part-Time Parliament, ACM TOCS, 1998
  4. H. Robinson, Consensus Protocols: Paxos, Online, 2009


  1. G. DeCandia, et al., Dynamo: Amazon’s Highly Available Key-Value Store, SOSP, 2007
  2. B. Cooper, et al., PNUTS: Yahoo!’s Hosted Data Serving Platform, VLDB, 2008
  3. Werner Vogels, Eventually Consistent, ACM Queue, 2009
  4. A. Lakshman, Cassandra - A Decentralized Structured Storage System, SIGOPS Operating Systems Review, 2010


  1. F. Chang, et al., Bigtable: A Distributed Storage System For Structured Data, OSDI, 2006
  2. J. Baker, et al., MegaStore: Providing Scalable, Highly Available Storage For Interactive Services, CIDR, 2011
  3. M. Stonebraker, et al., SQL Databases v. NoSQL Databases, Communications of the ACM, 2010


  1. M. Stonebraker et al., The End Of An Architectural Era: (It’s Time For A Complete Rewrite), VLDB, 2007
  2. A. Thomson, et al., Calvin: Fast Distributed Transactions For Partitioned Database Systems, SIGMOD, 2012
  3. M. Aslett, How Will The Database Incumbents Respond To NoSQL And NewSQL?, 451 Group, 2010
  4. M. Stonebraker, New Opportunities For NewSQL, Communications of the ACM, 2012
  5. M. Stonebraker, et al., Ten Rules For Scalable Performance In Simple Operation’ Datastores, Communications of the ACM, 2011
  6. A. Thomson, et al., The Case For Determinism In Database Systems, VLDB, 2010


  1. C. Curino, et al., Schism: A Workload-Driven Approach To Database Replication And Partitioning, VLDB, 2010
  2. A. Pavlo, et al., Skew-Aware Automatic Database Partitioning In Shared-Nothing, Parallel OLTP Systems, SIGMOD, 2012
  3. C. Curino, et al., Relational Cloud: A Database Service For The Cloud, CIDR, 2011
  4. A. Pavlo, et al., On Predictive Modeling For Optimizing Transaction Execution In Parallel OLTP Systems, VLDB, 2011

Distributed Data Stores I

  1. J.C. Corbett, et al., Spanner: Google’s Globally-Distributed Database, OSDI, 2012
  2. J. Shute, et al., F1: A Distributed SQL Database That Scales, VLDB, 2013
  3. Murat Demirbas, Overview Of Spanner, Online, 2013

Distributed Data Stores II

  1. L. Qiao, et al., On Brewing Fresh Espresso: Linkedin’s Distributed Data Serving Platform, SIGMOD, 2013
  2. N. Bronson, et al., Tao: Facebook’s Distributed Data Store For The Social Graph, USENIX ATC, 2013

Distributed Stream Processing

  1. T. Akidau, et al., MillWheel: Fault-Tolerant Stream Processing At Internet Scale, VLDB, 2013
  2. L. Abraham, et al., Scuba: Diving Into Data At Facebook, VLDB, 2013
  3. L. Neumeyer, et al., S4: Distributed Stream Computing Platform, ICDMW, 2010
  4. M. Zaharia, et al., Discretized Streams: An Efficient And Fault-Tolerant Model For Stream Processing On Large Clusters, HotCloud, 2012

Alternative Data Storage & Models

  1. D. Abadi, et al., Column-Stores vs. Row-Stores: How Different Are They Really?, SIGMOD, 2008
  2. Paul G. Brown, Overview Of Scidb: Large Scale Array Storage, Processing And Analysis, SIGMOD, 2010

Data Warehouses I

  1. A. Pavlo, et al., A Comparison Of Approaches To Large-Scale Data Analysis, SIGMOD, 2009
  2. A. Abouzied, et al., HadoopDB: An Architectural Hybrid Of MapReduce And DBms Technologies For Analytical Workloads, VLDB, 2009
  3. A. Thusoo, et al., Hive: A Warehousing Solution Over A MapReduce Framework, VLDB, 2009

Data Warehouses II

  1. S. Melnik, et al., Dremel: Interactive Analysis Of Web-Scale Datasets, VLDB, 2010
  2. R. Xin, et al., Shark: SQL And Rich Analytics At Scale, SIGMOD, 2013

Machine Learning Systems I

  1. G. Malewicz, et al., Pregel: A System For Large-Scale Graph Processing, SIGMOD, 2010
  2. M. Zaharia, et al., Resilient Distributed Datasets: A Fault-Tolerant Abstraction For In-Memory Cluster Computing, NSDI, 2012

Machine Learning Systems II

  1. Y. Low, et al., Distributed GraphLab: A Framework For Machine Learning And Data Mining In The Cloud, VLDB, 2012
  2. A. Kyrola, et al., GraphChi: Large-Scale Graph Computation On Just A PC, OSDI, 2012
  3. Y. Low, et al., GraphLab: A New Parallel Framework For Machine Learning, UAI, 2010

In Situ Data Processing

  1. I. Alagiannis, et al., NoDB: Efficient Query Execution On Raw Data Files, SIGMOD, 2012
  2. A. Abouzied, et al., Invisible Loading: Access-Driven Data Transfer From Raw Files Into Database Systems, EDBT, 2013


  1. A. Kemper, et al., HyPER: A Hybrid OLTP & OLAP Main Memory Database System Based On Virtual Memory Snapshots, ICDE, 2011
  2. V. Sikka, et al., Efficient Transaction Processing In SAP HANA Database: The End Of A Column Store Myth, SIGMOD, 2012
  3. J. Lee, et al., High-Performance Transaction Processing In SAP HANA, ICDE Bulletin, 2013
  4. J. Dittrich, et al., Towards A One-Size Fits All DB Architecture, CIDR, 2011
  5. M. Grund, et al., HYRISE: A Main Memory Hybrid Storage Engine, VLDB, 2010
  6. T. Muhlbauer, et al., ScyPer: Elastic OLAP Throughput On Transactional Data, DanaC, 2013


  1. M. Franklin, et al., CrowdDB: Answering Queries With Crowdsourcing, SIGMOD, 2011
  2. M. Stonebraker, et al., Data Curation At Scale: The Data Tamer System, CIDR, 2013
  3. A. Parameswaran, et al., Crowdscreen: Algorithms For Filtering Data With Humans, SIGMOD, 2012
  4. A. Marcus, et al., Human-Powered Sorts And Joins, CIDR, 2013

comments powered by Disqus