Research Topics in Database Management Systems
Course description
This seminar focuses on recent research results in the intersection of data management and systems. There is no formal textbook for this course. We will mostly be reading and discussing recently published papers in venues such as SIGMOD, VLDB and ICDE. An important component of the course is an individual research project, where you will pick one topic of interest in the area of database management systems and explore it in depth.
This course mainly discusses the latest research findings on data management and builds on the foundations that have been introduced in the CSE 5242, the Advanced Database Management Systems course. If you are not motivated to study and conduct independent research, this course does not have a structure to guide you to success (such as a textbook, exams, or help from a GTA).
Schedule
# |
Lecture material |
1 |
Overview, class logistics. |
2 |
Florin Rusu, Yu Cheng: A
Survey on Array Storage, Query Languages, and Systems. arXiv:1302.0103 [cs.DB] (2013) |
3 |
Emad Soroush, Magdalena
Balazinska, Daniel L. Wang: ArrayStore:
a storage manager for complex parallel array processing. SIGMOD
Conference 2011: 253-264 |
4 |
Jennie Duggan, Olga Papaemmanouil,
Leilani Battle, Michael Stonebraker: Skew-Aware
Join Optimization for Array Databases. SIGMOD Conference 2015: 123-135 |
5 |
Ning Liu, Jason Cope, Philip H. Carns, Christopher D. Carothers, Robert B. Ross, Gary Grider, Adam Crume, Carlos Maltzahn: On the role of burst buffers in
leadership-class storage systems. MSST 2012 |
6 |
Jeff Shute, Radek Vingralek, Bart Samwel, Ben
Handy, Chad Whipkey, Eric Rollins, Mircea Oancea, Kyle
Littlefield, David Menestrina, Stephan Ellner, John Cieslewicz, Ian
Rae, Traian Stancescu, Himani Apte: F1: A Distributed
SQL Database That Scales. PVLDB 6(11): 1068-1079 (2013) |
7 |
Avrilia</span> Floratou, Umar Farooq Minhas, Fatma Özcan: SQL-on-Hadoop: Full Circle Back to Shared-Nothing Database
Architectures. PVLDB 7(12): 1295-1306 (2014) |
8 |
Jiexing</span> Li,
Jeffrey F. Naughton, Rimma V. Nehme:
Resource Bricolage for Parallel Database Systems. PVLDB 8(1): 25-36
(2014) |
9 |
Simon Loesing, Markus Pilman, Thomas Etter, Donald Kossmann: On the Design and Scalability of Distributed
Shared-Data Databases. SIGMOD Conference 2015: 663-676 |
10 |
Philip A. Bernstein, Sudipto
Das, Bailu Ding, Markus Pilman:
Optimizing Optimistic Concurrency Control for Tree-Structured,
Log-Structured Databases. SIGMOD Conference 2015: 1295-1309 |
11 |
Patrick E. O'Neil, Edward Cheng, Dieter Gawlick, Elizabeth J. O'Neil: The Log-Structured
Merge-Tree (LSM-Tree). Acta Inf. 33(4): 351-385
(1996) |
12 |
Jeremy Condit, Edmund B. Nightingale, Christopher
Frost, Engin Ipek,
Benjamin C. Lee, Doug Burger, Derrick Coetzee: Better I/O through
byte-addressable, persistent memory. SOSP 2009: 133-146 |
13 |
Shimin</span> Chen,
Qin Jin: Persistent B+-Trees in Non-Volatile
Main Memory. PVLDB 8(7): 786-797 (2015) |
14 |
Joy Arulraj, Andrew Pavlo, Subramanya Dulloor: Let's Talk About Storage & Recovery
Methods for Non-Volatile Memory Database Systems. SIGMOD Conference 2015:
707-722 |
15 |
Jose M. Faleiro, Daniel
J. Abadi: Rethinking serializable multiversion concurrency control. PVLDB 8(11):
1190-1201 (2015) |
16 |
Kai Zhang, Kaibo Wang,
Yuan Yuan, Lei Guo, Rubao
Lee, Xiaodong Zhang: Mega-KV: A Case for GPUs to
Maximize the Throughput of In-Memory Key-Value Stores. PVLDB 8(11):
1226-1237 (2015) |
17 |
Ziqiang</span> Feng,
Eric Lo, Ben Kao, Wenjian Xu: ByteSlice:
Pushing the Envelop of Main Memory Data Processing with a New Storage Layout.
SIGMOD Conference 2015: 31-46 |
18 |
Alekh</span>
Jindal, Endre Palatinus,
Vladimir Pavlov, Jens Dittrich: A Comparison of
Knives for Bread Slicing. PVLDB 6(6): 361-372 (2013) |
19 |
Eleni Petraki, Stratos Idreos, Stefan Manegold: Holistic Indexing in Main-memory Column-stores.
SIGMOD Conference 2015: 1153-1166 |
20 |
Ingo Müller, Peter Sanders, Arnaud Lacurie, Wolfgang Lehner, Franz
Färber: Cache-Efficient Aggregation: Hashing Is
Sorting. SIGMOD Conference 2015: 1123-1136 |
21 |
Project presentations (#1): ZD, DS. |
22 |
Project presentations (#2): PR, LY, HX. |
23 |
Project presentations (#3): SF, FL, SC. |
24 |
Ronald Barber, Guy M. Lohman, Ippokratis
Pandis, Vijayshankar
Raman, Richard Sidle, Gopi K. Attaluri,
Naresh Chainani, Sam Lightstone,
David Sharpe: Memory-Efficient Hash Joins. PVLDB 8(4): 353-364 (2014) |
25 |
Shumo</span> Chu,
Magdalena Balazinska, Dan Suciu:
From Theory to Practice: Efficient Join Query Evaluation in a Parallel
Database System. SIGMOD Conference 2015: 63-78 |
26 |
Molham</span> Aref, Balder ten Cate, Todd J. Green, Benny Kimelfeld, Dan Olteanu, Emir Pasalic, Todd L. Veldhuizen,
Geoffrey Washburn: Design and Implementation of the LogicBlox
System. SIGMOD Conference 2015: 1371-1382 |
Paper summaries
In order to make the most of our in-class time, you are expected to submit a summary of the assigned reading before each class. For all questions, don’t paraphrase (or copy verbatim) what is written in the paper. Papers frequently have different contributions than their authors claimed when they were writing them.
Paper summaries will be graded on a scale from zero to two. Zero is reserved for summaries that have not been submitted or are unreadable. One reflects a summary that can be improved, either for length, clarity or insight. Two represents a solid effort at summarizing the paper. One bonus point will be given to a few exceptionally insightful summaries.
Each summary must answer exactly the following questions. Remember that summaries are graded on clarity and insight, and not their length!
- What is your name?
- What is the paper you are summarizing?
- What problem was this paper addressing?
- What was the existing solution to this problem?
- What solution was this paper proposing?
- What are the conclusions you draw from the results?
- List three things you appreciated when reading this paper.
- List three things you believe can be improved in this paper.
Answers to all questions are due by 1am on the day the paper is discussed. Upload your answers to Carmen as a single plaintext file. Please include the questions in the submitted file. No Microsoft Word or Adobe Acrobat files will be accepted.
Class project
You will also work in an individual research project at a topic of mutual research interest. (Group projects will not be allowed.) I can provide a list of ideas on interesting topics and discuss about any ideas you have.
It is your responsibility to meet with the instructor periodically throughout the semester to discuss the general direction and the progress of the class project. You must take the initiative to actively explore the topic you choose, or else you will not accomplish much in the project. As a consequence, your class project grade will be adversely impacted.
Final report: The final project report should be at most twelve pages of text and figures in 11-point font. This includes any references to publications, URLs, manuals, etc. I will be looking for answers to the following questions:
- What is the problem you are solving?
- What have others done already to solve this or a similar problem?
- What is your solution, and what did you accomplish during the last three months?
- What are the results? Does your solution improve over what prior work has already accomplished?
- In retrospective, what could you have done better in this project?
- If someone else looks at this problem in the future, what are the aspects of the problem that you did not have time to explore?
Source code: Before submitting your source code, please delete any intermediate files and executable binaries. (These will not work in any other platform but your own system.) If you have worked with a large codebase (PostgreSQL, Impala, MySQL, etc.) please only submit a diff of your changes, and include a reference to what is the “base” version you modified. Examples include “PostgreSQL 9.4.0”, or “Linux 3.x development branch, git commit f3f62a38ce”.
If the source code is small (a few MBs), please upload it with your report on Carmen. Ohio State offers BuckeyeBox, a version of the Box file sharing service, for this purpose which you can access using your Carmen credentials. It is not necessary to use this service, as long as you include a link to your source code.