Article Source
Measuring Multimodal Reasoning with the MMMU Benchmarks
Abstract
Multimodal models have made significant strides in solving complex problems, but robust benchmarks are crucial for tracking progress and identifying ongoing challenges. In this talk, we will introduce MMMU and MMMU-Pro, two benchmarks designed to rigorously evaluate multimodal AI on expert-level reasoning tasks. MMMU comprises 11.5K multimodal questions spanning 30 subjects and 183 subfields, pushing models to demonstrate advanced reasoning and apply domain-specific knowledge. MMMU-Pro further raises the bar by filtering out text-only answerable questions, augmenting additional answer options, and incorporating vision-only inputs, which result in a more robust evaluation setup. Together, the MMMU benchmarks expose the limitations of current models and guide the development of future multimodal systems toward expert-level Artificial General Intelligence.