Stop Thinking, Just Do!

Sungsoo Kim's Blog

How to Train Your Agent - Building Reliable Agents with RL

tagsTags

23 September 2025


How to Train Your Agent: Building Reliable Agents with RL — Kyle Corbitt, OpenPipe

Abstract

Have you ever launched an awesome agentic demo, only to realize no amount of prompting will make it reliable enough to deploy in production? Agent reliability is a famously difficult problem to solve!

In this talk we’ll learn how to use GRPO to help your agent learn from its successes and failures and improve over time. We’ve seen dramatic results with this technique, such as an email assistant agent that whose success rate jumped from 74% to 94% after replacing o4-mini with an open source model optimized using GRPO.

We’ll share case studies as well as practical lessons learned around the types of problems this works well for and the unexpected pitfalls to avoid.

About Kyle Corbitt Kyle Corbitt is the co-founder and CEO of OpenPipe, the RL post-training company. OpenPipe has trained thousands of customer models for both enterprises and tech-forward startups.

Before founding OpenPipe, Kyle led the Startup School team at Y Combinator, which was responsible for the product and content that YC produces for early-stage companies. Prior to that he worked as an engineer at Google and studied ML at school.

Recorded at the AI Engineer World’s Fair in San Francisco. Stay up to date on our upcoming events and content by joining our newsletter here: https://www.ai.engineer/newsletter