SHANGHAI AI LAB × TSINGHUA UNIVERSITY

SU-01: The AI Math Prodigy

Engineering a 30B Mixture-of-Experts model that matches human Gold Medalists in the USAMO 2026.

35/35

USAMO Score

30B-A3B

MoE Architecture

100k+

Token Persistence

The "Specializable-Generalist" Paradigm

SU-01 addresses the fundamental question: Can AI replicate the persistence and logical depth of human mathematicians? Beyond mere memorization, SU-01 represents a breakthrough in specialized scaling laws. By applying a 4-step growth guide, we evolved a generally capable AI into an elite mathematical reasoning expert.

Reasoning Over Tokens Multi-stage RL

Core Objectives

✓ SFT & Reverse-Perplexity Curriculum
✓ Outcome vs. Process-based Rewards
✓ 100k Token Test-Time Scaling

Step 01: Foundational Learning

Supervised Fine-Tuning (SFT)

The foundation wasn't built on data volume, but on rigor. We used 338,000 highly refined trajectories from elite sources like Evan Chen's Olympiad materials.

Reverse-Perplexity Curriculum

Prioritizes learning patterns the AI finds most challenging, correcting superficial habits early.

Data Sophistication

Not just answers, but complete thinking processes: exploration, self-checking, and error correction.

Feature	General AI	SU-01 SFT
Learning Goal	Broad Info	Rigorous Proof
Difficulty Order	Random/Low	Reverse-Perplexity
Data Scale	Billions (General)	338k (Elite Logical)

The Evolution of Reward

Transitioning from "getting it right" to "doing it elegantly" through two-stage Reinforcement Learning.

Step 02

Coarse RL: Finding the Path

Simulating tens of thousands of sessions to build "Mathematical Intuition." The reward is binary: 1 for right, 0 for wrong.

Diverse Path Exploration (Rollouts)
Outcome-based Reward (0 or 1)
Reinforcement of Successful Memories

Step 03

Refined RL: Masterpiece Solutions

Transitioning from student to scholar. Evaluating proof quality and elegance through self-critique and experience replay.

Self-Critique Terminal

// AI (Draft)

"Proof risks becoming lengthy..."

// AI (Self-Critique)

"Wait, this can be transformed into complex plane rotation!"

// AI (Revised)

"Introducing complex number z... much more elegant."

Step 04: Real-World Review

Test-Time Scaling (TTS)

The ultimate demonstration of persistence. SU-01 doesn't put down its pen until the final bell rings, scaling its thinking process to massive lengths.

100k+

Token Thinking Process

Performance Leap (IMO-ProofBench)

Basic Skill (SFT Only) 36.2%

All Training (Direct) 57.6%

TOP TIER

With TTS Application 70.2%

"This signifies not just verbosity, but a high degree of concentration involving countless hypothesis formations, verifications, and backtracking for corrections."

The Future of Thinking

The growth of SU-01 delivers a powerful message: true genius stems not from high intelligence alone, but from the attitude of questioning one's own thought processes. Even with a compact 30B architecture, we can surpass giants through rigorous logic and unwavering persistence.

Rigorous SFT

Goal Exploration

Refined Truth

Unwavering TTS