DeepSeek’s GRPO (Group Relative Policy Optimization)

Abstract

In this video, I break down DeepSeek’s Group Relative Policy Optimization (GRPO) from first principles, without assuming prior knowledge of Reinforcement Learning. By the end, you’ll understand the core RL building blocks that led to GRPO, including:

Policy Gradient Methods
The REINFORCE Algorithm
Actor-Critic Models
PPO (Proximal Policy Optimization)
GRPO (Group-Relative policy Optimization)

Papers:

GRPO paper (DeepSeekMath): https://arxiv.org/pdf/2402.03300
DeepSeek-R1 paper: https://arxiv.org/pdf/2501.12948
PPO paper: https://arxiv.org/pdf/1707.06347
GAE paper: https://arxiv.org/pdf/1506.02438
TRPO paper: https://arxiv.org/pdf/1502.05477

Stop Thinking, Just Do!

DeepSeek's GRPO (Group Relative Policy Optimization)

Tags

1 August 2025

DeepSeek’s GRPO (Group Relative Policy Optimization)

Abstract

Papers: