AI Co-Mathematician | Research Portal

System Architecture

Moving beyond the chatbot paradigm to a collaborative, stateful research environment.

Core Design Principles

Beyond Proofs

Supports quasi-empirical activities like literature review and hypothesis brainstorming.

Native Artifacts

Generates professional LaTeX papers with margin notes and auditable history.

Negative Space

Preserves failed exploration paths as permanent, valuable knowledge assets.

Flexible Steering

Users can intervene asynchronously to modify high-level strategies.

Agent Hierarchy

01.

Project Coordinator

Strategic oversight & user interface
02.

Workstream Coordinator

Execution of linear sub-tasks
03.

Specialized Sub-agents

Coding, Search, Reasoning

"A true co-mathematician extends the researcher's thought process."

Rigorous Verification & Benchmarking

The system implements Hard Programming Constraints. Coding or reasoning agents cannot self-complete; results must pass an independent Reviewer Agent and satisfy "golden test cases."

48%

Accuracy on FrontierMath Tier 4 (Postdoc level problems)

87%

Success rate on internal unpublished research problems

Case Study: Lackenby's Proof

Mathematician Marc Lackenby used the system to bridge a logical gap in an unsolved Kourovka Notebook problem. When the Reviewer Agent identified a flaw, human intervention guided the system toward a final, defect-free proof.

[SYSTEM]: Reviewer Agent Rejected Proof #04
[REASON]: Topological pruning heuristic mismatch in Step 3.
[USER]: Suggesting alternate bridge via induction...
[SYSTEM]: Re-calculating workstream... SUCCESS.

Research Feed

An assetized library of key literature and methodologies powering the AI Co-Mathematician framework.

DeepMind • Architecture Paper

AI Co-mathematician: A Next-Generation Assistant for Mathematical Research

Details the architecture of Google DeepMind's AI Co-Mathematician, focusing on its "Stateful Workspace" and hierarchical agent orchestration. Introduces Project Coordinators and specialized sub-agents working in a shared file system. Highlights "Negative Space" for tracking failed explorations and the "Reviewer Agent" loop for programmatic constraint-based logical rigor.

Read Full Publication

Epoch AI • Benchmark

FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI

Presents a benchmark of research-level mathematical problems (Tier 4) created by Epoch AI and mathematicians, computationally verifiable. It's the primary evaluation suite for the AI Co-Mathematician, which achieved 48% accuracy on these postdoc-level challenges using multi-agent search and reasoning-heavy models.

View Benchmark Data

Neuro-Symbolic • Model Report

AlphaProof and AlphaGeometry 2: AI achieves silver medalist status

Explores neuro-symbolic approaches to mathematical reasoning, introducing AlphaProof for proof verification using Lean formal language and AlphaGeometry 2 for complex geometry problem-solving. These models provide core reasoning primitives for the AI Co-Mathematician's sub-agents.

Explore Methodology

Technical Analysis • arXiv:2502.10245

Agentic Workflows for Formal Verification and Mathematical Discovery

Analyzes the transition from atomic model inferences to long-running agentic workflows in mathematics. Focuses on technical challenges like "Death Spirals" (infinite feedback loops) and "Reviewer-pleasing bias," proposing asynchronous orchestration methods for human steering.

Download Preprint

Collaboration Study • arXiv:2504.08921

Closing the Gap in Formal Mathematics: A Study on Human-AI Collaborative Proving

Investigates the "Collaborative Efficacy" of the AI Co-Mathematician, evaluating how its LaTeX report generation with margin notes and historical failure logs (Negative Space) aids human intuition. Case studies with mathematicians illustrate its role as a "peer" by providing auditable evidence.

Read Final Case Study

The Future of Agentic
Knowledge Work

"Evaluated not just on speed of answering, but on Collaborative Efficacy and Stateful Exploration Capability."

Logically Rigorous

Beyond LLM Hallucinations

Human-AI Hybrid

Flexible Steering & Oversight