Knowledge Distillation - How LLMs train each other

Abstract

was prominently discussed at LlamaCon 2025.

You’ll learn:

What knowledge distillation really is (and what it’s not)
How it helps scale LLMs without bloating inference cost
The origin story from ensembles and model compression (2006) to Hinton’s “dark knowledge” paper (2015)
Why “soft labels” carry more information than one-hot targets
How companies like Google, Meta, and DeepSeek apply distillation differently
The true meaning behind terms like temperature, behavioral cloning, and co-distillation

Whether you’re building, training, or just trying to understand modern AI systems, this video gives you a deep but accessible introduction to how LLMs teach each other.

Stop Thinking, Just Do!

Knowledge Distillation - How LLMs train each other

Tags

30 July 2025

Knowledge Distillation - How LLMs train each other

Abstract