Deep Learning Under Distribution Shift

Machine Learning in Finance Workshop: 2020 Virtual Edition
Hosted by Bloomberg, The Fu Foundation School of Engineering & Applied Science (SEAS), The Data Science Institute at Columbia University
Zachary C. Lipton: Deep Learning Under Distribution Shift

Abstract

We might hope that when faced with unexpected inputs, well-designed software systems would fire off warnings. However, ML systems, which depend strongly on properties of their inputs (e.g. the i.i.d. assumption), tend to fail silently. Faced with distribution shift, we wish (i) to detect and (ii) to quantify the shift, and (iii) to correct our classifiers on the fly — when possible. This talk will describe a line of recent work on tackling distribution shift. First, I will focus on recent work on label shift, a more classic problem, where strong assumptions enable principled methods. Then I will discuss how recent tools from generative adversarial networks have been appropriated (and misappropriated) to tackle dataset shift—characterizing and (partially) repairing a foundational flaw in the method.

Bio

Zachary Chase Lipton is the BP Junior Chair Assistant Professor of Operations Research and Machine Learning at Carnegie Mellon University. His research spans core machine learning methods and their social impact and addresses diverse application areas, including clinical medicine and natural language processing. Current research focuses include robustness under distribution shift, breast cancer screening, the effective and equitable allocation of organs, and the intersection of causal thinking and the messy high-dimensional data that characterizes modern deep learning applications. He is the founder of the Approximately Correct blog (approximatelycorrect.com) and a co-author of Dive Into Deep Learning, an interactive open-source book drafted entirely through Jupyter notebooks. Find on Twitter (@zacharylipton) or GitHub (@zackchase).

Stop Thinking, Just Do!