Stop Thinking, Just Do!

Sungsoo Kim's Blog

Visual Geometry Grounded Transformer

tagsTags

11 December 2025


VGGT: Visual Geometry Grounded Transformer (Mar 2025)

Abstract

VGGT is a large feed-forward transformer that directly infers key 3D scene attributes—including camera parameters, depth maps, point maps, and 3D point tracks—from one or many input views in seconds. By bypassing traditional iterative optimization techniques like Bundle Adjustment, it achieves state-of-the-art performance in 3D reconstruction and serves as a versatile backbone for downstream tasks like video tracking and novel view synthesis.

Key Topics:

  • 3D Reconstruction
  • Visual Geometry
  • Transformers
  • Structure from Motion
  • Camera Pose Estimation