Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving (Dec 2025)
Abstract
- Title: Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving (Dec 2025)
- Link: http://arxiv.org/abs/2512.10739v2
- Date: December 2025
Summary:
This paper introduces Intern-S1-MO, a multi-agent system designed to solve ultra-hard mathematical problems at the International Mathematical Olympiad (IMO) level. To overcome the context length limitations of current Large Reasoning Models (LRMs), the system utilizes a hierarchical reasoning structure that alternates between reasoning, summarization, and verification, maintaining a compact memory of proven lemmas. The authors also propose OREAL-H, a reinforcement learning framework that optimizes the model using online explored trajectories and continuous process verification rewards. Intern-S1-MO matches the performance of silver medalists in IMO 2025 and achieves gold medal-level performance in the Chinese Mathematical Olympiad (CMO) 2025.
Key Topics:
- Mathematical Reasoning
- Intern-S1-MO
- Multi-agent Systems
- Reinforcement Learning (RL)
- OREAL-H Framework
- Large Reasoning Models (LRMs)
- Olympiad Math (IMO, CMO, AIME)