Stop Thinking, Just Do!

Sungsoo Kim's Blog

MapReduce and Hadoop Papers in the VLDB

tagsTags

23 March 2014


Summary

Proceedings of the VLDB Endowment, Volume 6, 2012-2013

MapReduce Papers

  1. Ablimit Aji, Fusheng Wang, Hoang Vo, Rubao Lee, Qiaoling Liu, Xiaodong Zhang, Joel Saltz: Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce.1009-1020
  2. Ahmed Eldawy, Mohamed Mokbel: A Demonstration of SpatialHadoop: An Efficient MapReduce Framework for Spatial Data.1230-1233
  3. Yoonjae Park, Jun-Ki Min, Kyuseok Shim: Parallel Computation of Skyline and Reverse Skyline Queries Using MapReduce.2002-2013

Hadoop Papers

  1. Andrii Cherniak, Huma Zaidi, Vladimir Zadorozhny: Optimization Strategies for A/B Testing on HADOOP. 973-984
  2. Khaled Elmeleegy: Piranha: Optimizing Short Jobs in Hadoop. 985-996
  3. Kai Ren, YongChul Kwon, Magdalena Balazinska, Bill Howe: Hadoop’s Adolescence. 853-864
  4. Andrii Cherniak, Huma Zaidi, Vladimir Zadorozhny: Optimization Strategies for A/B Testing on HADOOP. 973-984
  5. Khaled Elmeleegy: Piranha: Optimizing Short Jobs in Hadoop. 985-996
  6. K Ashwin Kumar, Jonathan Gluck, Amol Deshpande, Jimmy Lin: Hone: “Scaling Down” Hadoop on Shared-Memory Systems. 1354-1357

Proceedings of the VLDB Endowment, Volume 7, 2013-2014

MapReduce Papers

  1. Guoping Wang, Chee-Yong Chan: Multi-Query Optimization in MapReduce Framework. 145 - 156.
  2. Makoto Onizuka, Hiroyuki Kato, Soichiro Hidaka, Keisuke Nakano, Zhenjiang Hu: Optimization for iterative queries on MapReduce. 241 - 252.

Data-intensive Computing Systems

Required Readings

  1. Textbook [1] Chapter 3: The Hadoop Distributed FileSystem
  2. Textbook [1] Chapter 6: How MapReduce Works
  3. Textbook [1] Chapter 8: MapReduce Features
  4. Pig Latin: A Not-so-foreign Language for Data Processing
    By Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. SIGMOD Conference 2008
  5. Building a High-Level Dataflow System on top of MapReduce: The Pig Experience
    By Alan Gates, Olga Natkovich, Shubham Chopra, Pradeep Kamath, Shravan Narayanam, Christopher Olston, Benjamin Reed, Santhosh Srinivasan, Utkarsh Srivastava. VLDB Conference 2009
  6. Query Optimization
    Read Section 5 (Size-Distribution Estimator) of this paper by Yannis Ioannidis
  7. A Case for Flash Memory SSD in Enterprise Database Applications
    By Sang-Won Lee, Bongki Moon, Chanik Park, Jae-Myung Kim, and Sang-Woo Kim
  8. CoHadoop: Flexible Data Placement and its Exploitation in Hadoop
    By Mohamed Y. Eltabakh, Yuanyuan Tian, Fatma Ozcan, Rainer Gemulla, Aljoscha Krettek, and John McPherson
  9. Anatomy of the Google search engine (early version)
    By Sergey Brin and Lawrence Page
  1. Big data: The next frontier for competition, McKinsey report, 2011
  2. EMC’s articles on the Digital Universe. See the bottom right-hand corner for a series of interesting articles such as: The Diverse and Exploding Digital Universe
  3. Different subsystems in big data processing, Think Big Analytics.

References

[1] Hadoop: The Definitive Guide, by Tom White. O’Reilly Media. May 2012.


comments powered by Disqus