Stop Thinking, Just Do!

Sungsoo Kim's Blog

Self-designing Data Systems for the AI Era

tagsTags

26 December 2023


Article Source


Self-designing Data Systems for the AI Era

This is the diiP’s eighth distinguished lecture, delivered this time by Prof. Stratos Idreos, associate professor at Harvard University. More information and materials are available on our website: https://u-paris.fr/diip/diip-seminars/

Abstract

Data systems are everywhere. A data system is a collection of data structures and algorithms working together to achieve complex data processing tasks. For example, with data systems that utilize the correct data structure design for the problem at hand, we can reduce the monthly bill of large-scale data applications on the cloud by hundreds of thousands of dollars. We can accelerate data science tasks by dramatically speeding up the computation of statistics over large amounts of data. We can train drastically more neural networks within a given time budget, improving accuracy. However, knowing the right data system design for any given scenario is a notoriously hard problem; there is a massive space of possible designs, while no single design is perfect across all data, AI models, and hardware contexts. In addition, building a new system may take several years for any given (fixed) design.

We will discuss our quest for the first principles of AI system design. We will show that it is possible to reason about this massive design space. This allows us to create a self-designing system that can take drastically different shapes to optimize for the workload, hardware, and available cloud budget using a grammar for systems. These shapes include designs that are discovered automatically and do not (always) exist in the literature or industry, yet they can be more than 10x faster for modern AI and big data applications. We will discuss examples from diverse AI areas, including image storage and classification, neural networks, statistics, and big data systems.

Bio

Stratos Idreos is an associate professor of Computer Science at Harvard University, where he leads the Data Systems Laboratory. For his Ph.D. thesis on adaptive indexing, Stratos was awarded the 2011 ACM SIGMOD Jim Gray Doctoral Dissertation award and the 2011 ERCIM Cor Baayen award from the European Research Council on Informatics and Mathematics. In 2015 he was awarded the IEEE TCDE Rising Star Award from the IEEE Technical Committee on Data Engineering for his work on adaptive data systems, and in 2022 he received the ACM SIGMOD Test of Time award for the NoDB concept. Stratos is also a recipient of the National Science Foundation Career award and the Department of Energy Early Career award. Stratos was PC Chair of ACM SIGMOD 2021 and IEEE ICDE 2022, he is the founding editor of the ACM/IMS Journal of Data Science and the chair of the ACM SoCC Steering Committee. Finally, Stratos received the 2020 ACM SIGMOD Contributions award for his work on reproducible research.


comments powered by Disqus