Stop Thinking, Just Do!

Sungsoo Kim's Blog

Long-horizon Reasoning Agent

tagsTags

17 December 2025


Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving (Dec 2025)

Abstract

Summary:

This paper introduces Intern-S1-MO, a multi-agent system designed to solve ultra-hard mathematical problems at the International Mathematical Olympiad (IMO) level. To overcome the context length limitations of current Large Reasoning Models (LRMs), the system utilizes a hierarchical reasoning structure that alternates between reasoning, summarization, and verification, maintaining a compact memory of proven lemmas. The authors also propose OREAL-H, a reinforcement learning framework that optimizes the model using online explored trajectories and continuous process verification rewards. Intern-S1-MO matches the performance of silver medalists in IMO 2025 and achieves gold medal-level performance in the Chinese Mathematical Olympiad (CMO) 2025.

Key Topics:

  • Mathematical Reasoning
  • Intern-S1-MO
  • Multi-agent Systems
  • Reinforcement Learning (RL)
  • OREAL-H Framework
  • Large Reasoning Models (LRMs)
  • Olympiad Math (IMO, CMO, AIME)