Large Multimodal Models

CVPR2023 Tutorial on Vision Foundation Models
CVPR 2023 Tutorial on “Recent Advances in Vision Foundation Models”
Large Multimodal Models: Towards Building and Surpassing Multimodal GPT-4
By Chunyuan Li (Microsoft)

Abstract

Deep learning algorithms are well-known to have a propensity for fitting the training data very well and memorize idiosyncratic properties in the training examples. From a scientific perspective, understanding memorization in deep neural networks shed light on how those models generalize. From a practical perspective, understanding memorization is crucial to address privacy and security issues related to deploying models in real world applications. In this talk, we present a series of studies centered at quantifying memorization in neural language models. We explain why in many real world tasks, memorization is necessary for optimal generalization. We also present quantitative studies on memorization, forgetting and unlearning of both vision and language models, to better understand the behaviors and implications of memorization in those models.

Stop Thinking, Just Do!