SHANGHAI AI LAB × TSINGHUA UNIVERSITY

SU-01: The AI Math Prodigy

Engineering a 30B Mixture-of-Experts model that matches human Gold Medalists in the USAMO 2026.

35/35
USAMO Score
30B-A3B
MoE Architecture
100k+
Token Persistence

The "Specializable-Generalist" Paradigm

SU-01 addresses the fundamental question: Can AI replicate the persistence and logical depth of human mathematicians? Beyond mere memorization, SU-01 represents a breakthrough in specialized scaling laws. By applying a 4-step growth guide, we evolved a generally capable AI into an elite mathematical reasoning expert.

Reasoning Over Tokens Multi-stage RL

Core Objectives

  • SFT & Reverse-Perplexity Curriculum
  • Outcome vs. Process-based Rewards
  • 100k Token Test-Time Scaling
01

Step 01: Foundational Learning

Supervised Fine-Tuning (SFT)

The foundation wasn't built on data volume, but on rigor. We used 338,000 highly refined trajectories from elite sources like Evan Chen's Olympiad materials.

Reverse-Perplexity Curriculum

Prioritizes learning patterns the AI finds most challenging, correcting superficial habits early.

Data Sophistication

Not just answers, but complete thinking processes: exploration, self-checking, and error correction.

Feature General AI SU-01 SFT
Learning Goal Broad Info Rigorous Proof
Difficulty Order Random/Low Reverse-Perplexity
Data Scale Billions (General) 338k (Elite Logical)

The Evolution of Reward

Transitioning from "getting it right" to "doing it elegantly" through two-stage Reinforcement Learning.

02
Step 02

Coarse RL: Finding the Path

Simulating tens of thousands of sessions to build "Mathematical Intuition." The reward is binary: 1 for right, 0 for wrong.

  • Diverse Path Exploration (Rollouts)
  • Outcome-based Reward (0 or 1)
  • Reinforcement of Successful Memories
03
Step 03

Refined RL: Masterpiece Solutions

Transitioning from student to scholar. Evaluating proof quality and elegance through self-critique and experience replay.

Self-Critique Terminal
// AI (Draft)
"Proof risks becoming lengthy..."
// AI (Self-Critique)
"Wait, this can be transformed into complex plane rotation!"
// AI (Revised)
"Introducing complex number z... much more elegant."
04

Step 04: Real-World Review

Test-Time Scaling (TTS)

The ultimate demonstration of persistence. SU-01 doesn't put down its pen until the final bell rings, scaling its thinking process to massive lengths.

100k+
Token Thinking Process
Performance Leap (IMO-ProofBench)
Basic Skill (SFT Only) 36.2%
All Training (Direct) 57.6%
TOP TIER
With TTS Application 70.2%

"This signifies not just verbosity, but a high degree of concentration involving countless hypothesis formations, verifications, and backtracking for corrections."

The Future of Thinking

The growth of SU-01 delivers a powerful message: true genius stems not from high intelligence alone, but from the attitude of questioning one's own thought processes. Even with a compact 30B architecture, we can surpass giants through rigorous logic and unwavering persistence.

01
Rigorous SFT
02
Goal Exploration
03
Refined Truth
04
Unwavering TTS