Papers about MapReduce

Here are papers about MapReduce, and algorithms related to it and Hadoop (in chronological order) <ul><li>Ganesha: Black-Box Diagnosis for MapReduce Systems, Xinghao Pan, Jiaqi Tan, Soila Kavulya, Rajeev Gandhi, Priya Narasimhan, Carnegie Mellon University, 2009. </li><li>Mochi: Visual Log-Analysis Based Tools for Debugging Hadoop, Jiaqi Tan, Xinghao Pan, Soila Kavulya, Rajeev Gandhi, Priya Narasimhan, Carnegie Mellon University, 2009. </li><li>2009 Hadoop Sort Benchmarks, Owen O’Malley and Arun C Murthy, 2009. </li><li>Ranking and Semi-supervised Classification on Large Scale Graphs Using Map-Reduce, Delip Rao and David Yarowsky, Dept. of Computer Science, Johns Hopkins University, 2009 </li><li>SALSA: Analyzing Logs as State Machines, Jiaqi Tan, Xinghao Pan, Soila Kavulya, Rajeev Gandhi, Priya Narasimhan, Carnegie Mellon University, 2008. </li><li>Improving MapReduce Performance in Heterogeneous Environments, Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz and Ion Stoica, UC Berkeley, 2008 </li><li>MapReduce: Simplified Data Processing on Large Clusters Dean, Jeffrey & Ghemawat, Sanjay, Google, 2004. </li></ul></div>

References

[1] Andrew Pavlo et al, “A comparison of approaches to large-scale data analysis”, In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, Pages 165-178, 2009.
[2] Jacob Leverich et al, “On the energy (in)efficiency of Hadoop clusters”, ACM SIGOPS Operating Systems Review archive, Volume 44 Issue 1, Pages 61-65, January 2010. [3] Yanpei Chen et al, “To compress or not to compress - compute vs. IO tradeoffs for mapreduce energy efficiency”, In Proceedings of the first ACM SIGCOMM workshop on Green networking, Pages 23- 28, 2010.
[4] Hung-chih Yang et al, “Map-reduce-merge: simplified relational data processing on large clusters”, In Proceedings of the 2007 ACM SIGMOD international conference on Management of data, Pages 1029-1040, 2007.
[5] Grzegorz Malewicz, “Pregel: a system for large- scale graph processing”, In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, Pages 135-146, 2010.
[6] http://giraph.apache.org/
[7] http://hama.apache.org/
[8] Foto N.Afrati et al, “Optimizing joins in a map- reduce environment”, Proceedings of the 13th International Conference on Extending Database Technology, Pages 99-110, 2010.
[9] Rares Vernica et al, “Efficient parallel set-similarity joins using MapReduce”, In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, Pages 495-506, 2010.
[10] Spyros Blanas et al, “A comparison of join algorithms for log processing in MaPreduce”, In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, Pages 975-986,

[11] Y. Lin et al . Llama, “Leveraging Columnar Storage for Scalable Join Processing in the MapReduce Framework”, In Proceedings of the 2011 ACM SIGMOD, 2011.
[12] A. Okcan et al, “Processing Theta-Joins using MapReduce”, In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data Pages 949-960, 2011.
[13] Qifa Ke et al, “Optimizing data partitioning for data-parallel computing”, In Proceeding of HotOS’13 Proceedings of the 13th USENIX conference on Hot topics in operating systems, Pages 13-13, 2011.
[14] Ronnie Chaiken et al, “SCOPE: easy and efficient parallel processing of massive data sets”, Journal Proceedings of the VLDB Endowment, Volume 1 Issue 2, Pages 1265-1276, August 2008.
[15] Christoper Olston et al, “Pig latin: a not-so- foreign language for data processing”, In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, Pages 1099-1110, 2008.
[16] http://developer.yahoo.com/blogs/hadoop/next- generation-apache-hadoop-mapreduce-3061.html
[17] http://hadoop.apache.org/docs/stable/fair_scheduler. html
[18] Shrinivas B. Joshi, “Apache hadoop performance- tuning methodologies and best practices”, In Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering, Pages 241- 242, 2012.
[19] J. Dittrich et al, “Hadoop++: Making a yellow elephant run like a cheetah (without it even noticing)”, In Proceedings of the VLDB Endowment, 3(1-2):515– 529, 2010.
[20] F. Chang et al, “Bigtable: A distributed storage system for structured data”, ACM Transactions on Computer Systems, 26(2):1–26, 2008.
[21] Y. He et al, “RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems”, In Proceedings of the 2011 IEEE ICDE, 2011.
[22] Y. Lin et al, “Llama: Leveraging Columnar Storage for Scalable Join Processing in the MapReduce Framework”, In Proceedings of the 2011 ACM SIGMOD, 2011.
[23] https://issues.apache.org/jira/browse/HADOOP- 7206
[24] https://code.google.com/p/snappy/

Stop Thinking, Just Do!