Multimodal Prompting with a 44-minute Movie

Abstract

This is a demo of long context understanding, an experimental feature in our newest model, Gemini 1.5 Pro using a 44-minute silent Buster Keaton movie, Sherlock Jr., and a series of multimodal prompts.

This demo is a continuous recording of a live model interaction. Sequences have been shortened with response times shown.

Token count details: The input video (696,161 tokens) and image (256 tokens) total 696,417 tokens. The text inputs add additional tokens into the prompt, yielding the 696,538 token total shown in the interface.

To learn more about Gemini 1.5, visit https://goo.gle/3weBZhn

Stop Thinking, Just Do!