Stop Thinking, Just Do!

Sungsoo Kim's Blog

Knowledge Distillation - How LLMs train each other

tagsTags

30 July 2025


Knowledge Distillation - How LLMs train each other

Abstract

was prominently discussed at LlamaCon 2025.

You’ll learn:

  • What knowledge distillation really is (and what it’s not)
  • How it helps scale LLMs without bloating inference cost
  • The origin story from ensembles and model compression (2006) to Hinton’s “dark knowledge” paper (2015)
  • Why “soft labels” carry more information than one-hot targets
  • How companies like Google, Meta, and DeepSeek apply distillation differently
  • The true meaning behind terms like temperature, behavioral cloning, and co-distillation

Whether you’re building, training, or just trying to understand modern AI systems, this video gives you a deep but accessible introduction to how LLMs teach each other.