MATH and GSM8K.

Best AI models for math.

MATH (competition-level problems, formal proofs) weighted heaviest, GSM8K (grade-school word problems) as the floor. Models that win both handle algebra, calculus, and chain-of-thought arithmetic.

Benchmarks used: MATH · 70% GSM8K · 30%

Showing top 19 models with published data on at least one of the benchmarks above. Scores are weighted averages on a 0–100 scale.

AI model leaderboards

More leaderboards.