Summary
- Article Source: Proceedings of the VLDB Endowment
- Editor-in-Chief: Michael Böhlen, Christoph Koch
Proceedings of the VLDB Endowment, Volume 6, 2012-2013
MapReduce Papers
- Ablimit Aji, Fusheng Wang, Hoang Vo, Rubao Lee, Qiaoling Liu, Xiaodong Zhang, Joel Saltz: Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce.1009-1020
- Ahmed Eldawy, Mohamed Mokbel: A Demonstration of SpatialHadoop: An Efficient MapReduce Framework for Spatial Data.1230-1233
- Yoonjae Park, Jun-Ki Min, Kyuseok Shim: Parallel Computation of Skyline and Reverse Skyline Queries Using MapReduce.2002-2013
Hadoop Papers
- Andrii Cherniak, Huma Zaidi, Vladimir Zadorozhny: Optimization Strategies for A/B Testing on HADOOP. 973-984
- Khaled Elmeleegy: Piranha: Optimizing Short Jobs in Hadoop. 985-996
- Kai Ren, YongChul Kwon, Magdalena Balazinska, Bill Howe: Hadoop’s Adolescence. 853-864
- Andrii Cherniak, Huma Zaidi, Vladimir Zadorozhny: Optimization Strategies for A/B Testing on HADOOP. 973-984
- Khaled Elmeleegy: Piranha: Optimizing Short Jobs in Hadoop. 985-996
- K Ashwin Kumar, Jonathan Gluck, Amol Deshpande, Jimmy Lin: Hone: “Scaling Down” Hadoop on Shared-Memory Systems. 1354-1357
Proceedings of the VLDB Endowment, Volume 7, 2013-2014
MapReduce Papers
- Guoping Wang, Chee-Yong Chan: Multi-Query Optimization in MapReduce Framework. 145 - 156.
- Makoto Onizuka, Hiroyuki Kato, Soichiro Hidaka, Keisuke Nakano, Zhenjiang Hu: Optimization for iterative queries on MapReduce. 241 - 252.
Data-intensive Computing Systems
Required Readings
- Textbook [1] Chapter 3: The Hadoop Distributed FileSystem
- Textbook [1] Chapter 6: How MapReduce Works
- Textbook [1] Chapter 8: MapReduce Features
- Pig Latin: A Not-so-foreign Language for Data
Processing
By Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. SIGMOD Conference 2008 - Building a High-Level Dataflow System on top of MapReduce: The Pig
Experience
By Alan Gates, Olga Natkovich, Shubham Chopra, Pradeep Kamath, Shravan Narayanam, Christopher Olston, Benjamin Reed, Santhosh Srinivasan, Utkarsh Srivastava. VLDB Conference 2009 - Query Optimization
Read Section 5 (Size-Distribution Estimator) of this paper by Yannis Ioannidis - A Case for Flash Memory SSD in Enterprise Database
Applications
By Sang-Won Lee, Bongki Moon, Chanik Park, Jae-Myung Kim, and Sang-Woo Kim - CoHadoop: Flexible Data Placement and its Exploitation in
Hadoop
By Mohamed Y. Eltabakh, Yuanyuan Tian, Fatma Ozcan, Rainer Gemulla, Aljoscha Krettek, and John McPherson - Anatomy of the Google search engine (early
version)
By Sergey Brin and Lawrence Page
Recommended Readings
- Big data: The next frontier for competition, McKinsey report, 2011
- EMC’s articles on the Digital Universe. See the bottom right-hand corner for a series of interesting articles such as: The Diverse and Exploding Digital Universe
- Different subsystems in big data processing, Think Big Analytics.
References
[1] Hadoop: The Definitive Guide, by Tom White. O’Reilly Media. May 2012.