Stop Thinking, Just Do!

Sungsoo Kim's Blog

The Rise of World Models and Spatial Intelligence

tagsTags

31 January 2026


Beyond LLMs: The Rise of World Models and Spatial Intelligence

Abstract

Why is the entire AI industry suddenly obsessed with World Models? While Large Language Models (LLMs) have transformed how we access abstract knowledge, they remain “words in the dark”—eloquent but inexperienced and ungrounded in the physical world. To reach the goal of Artificial General Intelligence (AGI), AI systems must be able to grok the physical world and perform actions within it, a feat that cannot be achieved simply by reading text.

In this video, we dive deep into the next big target for labs like Google, OpenAI, Nvidia, Runway, and Luma: Spatial Intelligence.

What You’ll Learn:

  • The Three Essentials of Spatial Intelligence: Why world models must be generative, multimodal by design (making 3D a first-class citizen), and interactive.
  • The Two Paths to World Building:
  • Explicit 3D Representations: How companies like World Labs and Spatial create Gaussian splats or meshes for virtual production and simulation.
  • Predictive Pixel Generation: Why Google (Genie 3) and Runway are focused on autoregressive diffusion models that generate the “next frame” on demand, acting like a cloud-streamed game engine powered by a neural network.
  • Real-World Applications: From training robotic arms and self-driving cars (Tesla’s use of Gaussian splatting) to creating “just-in-time” personalized media feeds.
  • The Macro View: Exploring Nvidia’s Earth 2 for climate prediction and Google’s Alpha Earth Foundation, which uses petabytes of geospatial data to create a “searchable” planet.

The Next Gold Rush:

As we move toward a future where we can “conjure up any world” in a digital Holodeck, the focus has shifted to data collection**. We discuss Meta’s **Project Arya and the race to collect egocentric data to ground the next generation of humanoid robots.

World models are the glue connecting the world of bits and the world of atoms. Whether it’s for cinematic storytelling or robust robotics, this technology is redefining our interaction with reality.