Stop Thinking, Just Do!

Sung-Soo Kim's Blog

Papers to Read

tagsTags

7 May 2014


Article Source

Data Partitioning/Placement

  • J Zhou, N Bruno, and W Lin Advanced partitioning techniques for massively distributed computation Proc ACM SIGMOD International Conference on Management of Data, pages 13-24, 2012
  • A Pavlo, C Curino, and S Zdonik Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems Proc ACM SIGMOD International Conference on Management of Data, pages 61-72, 2012
  • Mohamed Y Eltabakh, Yuanyuan Tian, and Fatma Özcan, Rainer Gemulla, Aljoscha Krettek, and John McPherson CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop Proc VLDB, 4(9): 575-585, 2011

Distributed Queries

  • L Amsaleg, M Franklin, A Tomasic, Dynamic Query Operator Scheduling for Wide-Area Remote Access, Distributed and Parallel Databases, 6(3): 217-246, 1998
  • A Halevy, Answering queries using views: A survey, VLDB J, 10(4): 270-294, 2001
  • R Avnur and J M Hellerstein, Eddies: Continuously adaptive query processing, Proc ACM SIGMOD Int Conf on Management of Data, pages 261-272, 2000
  • M A Shah, J M Hellerstein, S Chandrasekara, and M J Franklin, Flux: An adaptive partitioning operator for continuous query systems, Proc 19th Int Conf On Data Engineering, pages 25-36, 2003
  • F Porto, ES Laber, and P Valduriez, Cherry picking: A semantic query processing strategy for the evaluation of expensive predicates, Proc Brazilian Symposium on Databases, pages 356-370, 2003
  • F Tian and D J DeWitt, Tuple routing strategies for distributed Eddies, Proc 29th Int Conf On Very Large Data Bases, pages 333-344, 2003
  • J R Thomsen, M L Yiu, and C S Jensen Effective caching of shortest paths for location-based services Proc ACM SIGMOD International Conference on Management of Data, pages 313-324, 2012
  • H Herodotou, N Borisov, and S Babu Query optimization techniques for partitioned tables Proc ACM SIGMOD International Conference on Management of Data, pages 49-60, 2011

Distributed Transactions

  • Alexander Thomson, Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip Shao, Daniel J Abadi: Calvin: fast distributed transactions for partitioned database systems,Proc ACM SIGMOD Int Conf on Management of Data, pages 1-12, 2012
  • Daniel Peng, Frank Dabek: Large-scale Incremental Processing Using Distributed Transactions and Notifications OSDI, pages 251-264, 2010
  • Jun Rao, Eugene J Shekita, Sandeep Tata: Using Paxos to Build a Scalable, Consistent, and Highly Available Datastore Proc VLDB, 4(4): 243-254, 2011
  • Peter Bailis, Shivaram Venkataraman, Michael J Franklin, Joseph M Hellerstein, Ion Stoica: Probabilistically Bounded Staleness for Practical Partial Quorums Proc VLDB, 5(8): 776-787, 2012
  • A Thomson, T Diamond, S-C Weng, K Ren, P Shao, and Daniel J Abadi Calvin: fast distributed transactions for partitioned database systems Proc ACM SIGMOD International Conference on Management of Data, pages 1-12, 2012
  • A Pavlo, E PC Jones, and S Zdonik On Predictive Modeling for Optimizing Transaction Execution in Parallel OLTP Systems Proc VLDB, 5(2): 85-96, 2012
  • HT Vo, S Wang, D Agrawal, G Chen, BC Ooi LogBase: A Scalable Log-structured Database System in the Cloud Proc VLDB, 5(10): 1004-1015, 2012
  • Stacy Patterson, Aaron J Elmore, Faisal Nawab, Divyakant Agrawal, Amr El Abbadi Serializability, not Serial: Concurrency Control and Availability in Multi-Datacenter Datastores Proc VLDB, 5(11): 1459-1470, 2012
  • Ippokratis Pandis, Pınar Tözün, Ryan Johnson, and Anastasia Ailamaki PLP: Page Latch-free Shared-everything OLTPProc VLDB, 4(10): 610-621, 2011

Data Replication

  • Yuri Breitbart, Raghavan Komondoor, Rajeev Rastogi, S Seshadri, Abraham Silberschatz: Update Propagation Protocols for Replicated Databases,Proc ACM SIGMOD Int Conf on Management of Data, pages 97-108, 1999
  • Carlo Curino, Yang Zhang, Evan P C Jones, Samuel Madden:Schism: a Workload-Driven Approach to Database Replication and Partitioning Proc VLDB, 3(1): 48-57, 2010
  • M Waldvogel, P Hurley, D Bauer, Dynamic Replica Management in Distributed Hash Tables, IBM Technical Report RZ 3502, 2003
  • M P Consens, K Ioannidou, J LeFevre, and N Polyzotis Divergent physical design tuning for replicated databases Proc ACM SIGMOD International Conference on Management of Data, pages 49-60, 2012
  • Peter Bailis, Shivaram Venkataraman, Michael J Franklin, Joseph M Hellerstein, Ion Stoica Probabilistically Bounded Staleness for Practical Partial Quorums Proc VLDB, 5(8): 776-787, 2012
  • Sudarshan Kadambi1, Jianjun Chen, Brian F Cooper, David Lomax1, Raghu Ramakrishnan, Adam Silberstein, Erwin Tam, and Hector Garcia-Molina Where in the World is My Data?, Proc VLDB, 4(11): 1040-1050, 2011

Parallel Data Management

  • F Akal, K Böhm, and H-J Schek, OLAP query evaluation in a database cluster: A performance study on intra-query parallelism, Proc 6th East European Conf Advances in Databases and Information Systems, pages 218-231, 2002
  • U Röhm, K Böhm, and H-J Schek, OLAP query routing and physical design in a database cluster, Advances in Database Technology, Proc 7th Int Conf On Extending Database Technology, pages 254-268, 2000
  • A Lima, M Mattoso, and P Valduriez, OLAP query processing in a database cluster, Proc 20th Int Euro-Par Conf, pages 355-362, 2004
  • C Furtado, A Lima, E Pacitti, P Valduriez and M Mattoso, Physical and virtual partitioning in OLAP database clusters, Proc Int Symp Computer Architecture and High Performance Computing, pages 143-150, 2005
  • C Furtado, A Lima, E Pacitti, P Valduriez and M Mattoso, Adaptive hybird partitioning for OLAP query processing in a database cluster, Int J High Perf Comput And Networking, 5(4): 251-262, 2008
  • H Köhler, J Yang, and X Zhou Efficient parallel skyline processing using hyperplane projections Proc ACM SIGMOD International Conference on Management of Data, pages 85-96, 2011
  • P Upadhyaya, YC Kwon, and M Balazinska A latency and fault-tolerance optimizer for online parallel query plans Proc ACM SIGMOD International Conference on Management of Data, pages 241-252, 2011
  • E Soroush, M Balazinska, and D Wang ArrayStore: a storage manager for complex parallel array processing Proc ACM SIGMOD International Conference on Management of Data, pages 253-264, 2011
  • Martina-Cezara Albutiu, Alfons Kemper, Thomas Neumann Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems Proc VLDB, 5(10): 1064-1075, 2012

Database Integration

  • R J Miller, L M Haas, and M A Hernandez Schema Mapping as Query Discovery, In Proc Int Conf on Very Large Data Bases, 2000
  • A Doan, P Domingos, and A Halevy Learning to Match the Schemas of Databases: A Multistrategy Approach, Machine Learning, 50(3): 279 - 301, 2003
  • R McCann, B AlShelbi, Q Le, H Nguyen, L Vu, and A Doan Maveric: Mapping Maintenance for Data Integration Systems, In Proc Int Conf on Very Large Data Bases, 2005
  • H Galhardas, D Florescu, D Shasha, E Simon, and C-A Saita Declarative Data Cleaning: Language, Models, and Algorithms, In Proc Int Conf on Very Large Data Bases, 2001
  • V Raman and J Hellerstein, Potter’s wheel: An interactive data cleaning system, Proc 27th Int Conf On Very Large Data Bases, pages 381-390, 2001
  • S Chaudhuri, K Ganjam, V Ganti, and R Motwani Robust and Efficient Fuzzy Match for Online Data Cleaning In Proc ACM SIGMOD Int Conf on Management of Data, 2003
  • L M Haas, D Kossmann, E L Wimmers, and J Yang Optimizing Queries Across Diverse Data Sources, In Proc Int Conf on Very Large Data Bases, pages 276-285, 1997
  • Zachary G Ives, Daniela Florescu, Marc Friedman, Alon Levy, Daniel S Weld An Adaptive Query Execution System for Data Integration, In Proc ACM SIGMOD Int Conf on Management of Data, 1999
  • Zachary G Ives, Alon Y Halevy, Daniel S Weld Adapting to Source Properties in Processing Data Integration Queries, Proc ACM SIGMOD Int Conf on Management of Data, pages 395-406, 2004
  • L Qian, M J Cafarella, and H V Jagadish Sample-driven schema mapping Proc ACM SIGMOD Int Conf on Management of Data, pages 73-84, 2012
  • H Elmeleegy, A Elmagarmid, and J Lee Leveraging query logs for schema mapping generation in U-MAP Proc ACM SIGMOD Int Conf on Management of Data, pages 121-132, 2011
  • B Alexe, B ten Cate, P G Kolaitis, and W-C Tan Designing and refining schema mappings via data examples Proc ACM SIGMOD Int Conf on Management of Data, pages 133-144, 2011
  • M Zhang, M Hadjieleftheriou, B COoi, C M Procopiuc, and D Srivastava Automatic discovery of attributes in relational databases Proc ACM SIGMOD Int Conf on Management of Data, pages 109-120 2011
  • W Fan, J Li, S Ma, N Tang, and W Yu Interaction between record matching and data repairing Proc ACM SIGMOD Int Conf on Management of Data, pages 469-480, 2011
  • Jiannan Wang, Guoliang Li, Jeffrey Xu Yu, and Jianhua Feng Entity Matching: How Similar Is Similar, Proc VLDB, 4(10): 622-633, 2011
  • Vibhor Rastogi, Nilesh N Dalvi, Minos N Garofalakis Large-Scale Collective Entity Matching Proc VLDB, 4(4): 208-218, 2011

Peer-to-Peer Data Management

  • A Kementsietsidis, M Arenas, R J Miller: Mapping Data in Peer-to-Peer Systems Semantics and Algorithmic Issues, In Proc ACM SIGMOD Int Conf on Management of Data, pages 325-336, 2003
  • B Yang, H Garcia-Molina, Comparing Hybrid Peer-to-Peer Systems, In Proc of 27th International Conference on Very Large Data Bases, 2001
  • A Crespo, H Garcia-Molina, Routing Indices For Peer-to-Peer Systems, In Proc International Conference on Distributed Computing Systems, 2002
  • S Ratnasamy, P Francis, M Handley, R Karp, S Shenker, A Scalable Content-Addressable Network, In Proc ACM SIGCOMM Conf on Applications, Technologies, Architectures, and Protocols for Computer Communication, 2001
  • BY Zhao, L Huang, J Stribling, SC Rhea, A D Joseph, and JD Kubiatowicz, Tapestry: A Resilient Global-Scale Overlay for Service Deployment, IEEE J on Selected Areas in Comm, 22(1), January 2004
  • K Aberer, P Cudre-Mauroux, and M Hauswirth, The Chatty Web: Emergent Semantics Through Gossiping, In Proc 12th Int World Wide Web Conf, 2003
  • B Gedik and L Liu, PeerCQ: A Decentralized and Self-Configuring Peer-to-Peer Information Monitoring System In Proc 23rd Int Conf on Distributed Computing Systems, 2003
  • WS Ng, B C Ooi, K-L Tan, and A Zhou, PeerDB: A P2P-based System for Distributed Data Sharing In Proc 19th Int Conf on Data Eng, 2003
  • P Kalnis, WS Ng, B C Ooi, D Papadias, and K-L Tan, An Adaptive Peer-to-Peer Network for Distributed Caching of OLAP Results, In Proc ACM SIGMOD Int Conf on Management of Data, 2002

Stream Data Management

  • C Cranor, T Johnson, O Spatscheck, and V Shkapenyuk Gigascope: high performance network monitoring with an SQL interface In Proc ACM SIGMOD Int Conf on Management of Data, pages 647-651, 2003
  • E Rundensteiner, L Ding, T Sutherland, Y Zhu, B Pielech, and N Mehta CAPE: continuous query engine with heterogeneous-grained adaptivity In Proc 30th Int Conf on Very Large Data Bases, pages 1353- 1356, 2004
  • D Abadi, Y Ahmad, M Balazinska, U Cetintemel, M Cherniack, J-H Hwang, W Lindner, A Rasin, N Tatbul, Y Xing, and S Zdonik The design of the Borealis stream processing engine In Proc 1st Biennial Conf on Innovative Data Syst Res, 2005
  • L Golab and M T Özsu Update-pattern-aware modeling and processing of continuous queries In Proc ACM SIGMOD Int Conf on Management of Data, pages 658-669, 2005
  • A Ayad and J Naughton Static optimization of conjunctive queries with sliding windows over unbounded streaming information sources In Proc ACM SIGMOD Int Conf on Management of Data, pages 419- 430, 2004
  • M Datar, A Gionis, P Indyk, and R Motwani Maintaining stream statistics over sliding windows In Proc 13th SIAM-ACM Symp on Discrete Algorithms, pages 635-644, 2002
  • L Golab and M T Ozsu Processing sliding window multi-joins in continuous queries over data streams In Proc 29th Int Conf on Very Large Data Bases, pages 500-511, 2003
  • B Babcock, S Babu, M Datar, and R Motwani Chain: Operator scheduling for memory minimization in data stream systems In Proc ACM SIGMOD Int Conf on Management of Data, pages 253-264, 2003
  • D Carney, U Cetintemel, A Rasin, S Zdonik, M Cherniack, and M Stonebraker Operator scheduling in a data stream manager In Proc 29th Int Conf on Very Large Data Bases, pages 838-849, 2003
  • N Tatbul, U Cetintemel, S Zdonik, M Cherniack, and M Stonebraker Load shedding in a data stream manager In Proc 29th Int Conf on Very Large Data Bases, pages 309-320, 2003
  • J-H Hwang, M Balazinska, A Rasin, U Cetintemel, M Stonebraker, and S Zdonik High-availability algorithms for distributed stream processing In Proc 21st Int Conf on Data Engineering, pages 779-790, 2005

MapReduce-based Data Management

  • Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung: The Google file system SOSP, pages 29-43, 2003
  • K Shvachko, H Kuang, S Radia, R Chansler, The Hadoop Distributed File System, IEEE 26th Symposium on Mass Storage Systems and Technologies, 2010
  • Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, Andrew Tomkins: Pig latin: a not-so-foreign language for data processingProc ACM SIGMOD Int Conf on Management of Data, pages 1099-1110, 2008
  • Alan Gates, Olga Natkovich, Shubham Chopra, Pradeep Kamath, Shravan Narayanam, Christopher Olston, Benjamin Reed, Santhosh Srinivasan, Utkarsh Srivastava: Building a HighLevel Dataflow System on top of MapReduce: The Pig Experience Proc VLDB 2(2): 1414-1425, 2009
  • Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels Dynamo: Amazon’s Highly Available Key-Value Store SOSP, 2007
  • Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C Hsieh, Deborah A Wallach, Michael Burrows, Tushar Chandra, Andrew Fikes, Robert E Gruber: Bigtable: A Distributed Storage System for Structured Data, ACM Trans Comput Syst, 26(2): Article 4, 2008
  • Brian F Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana Yerneni: PNUTS: Yahoo!’s hosted data serving platform Proc 34th Int Conf on Very Large Data Bases, pages 1277-1288, 2008
  • Iman Elghandour, Ashraf Aboulnaga ReStore: Reusing Results of MapReduce Jobs Proc VLDB, 5(6): 586-597, 2012

comments powered by Disqus