Stop Thinking, Just Do!

Sungsoo Kim's Blog

DeepSeek's GRPO (Group Relative Policy Optimization)

tagsTags

1 August 2025


DeepSeek’s GRPO (Group Relative Policy Optimization)

Abstract

In this video, I break down DeepSeek’s Group Relative Policy Optimization (GRPO) from first principles, without assuming prior knowledge of Reinforcement Learning. By the end, you’ll understand the core RL building blocks that led to GRPO, including:

  • Policy Gradient Methods
  • The REINFORCE Algorithm
  • Actor-Critic Models
  • PPO (Proximal Policy Optimization)
  • GRPO (Group-Relative policy Optimization)

Papers: