MATH und GSM8K.
Beste KI-Modelle für Mathematik.
MATH (Wettbewerbsaufgaben, formale Beweise) am stärksten gewichtet, GSM8K (Grundschul-Textaufgaben) als Untergrenze. Modelle, die beide gewinnen, beherrschen Algebra, Analysis und Chain-of-Thought-Rechnen.
Verwendete Benchmarks:
MATH · 70%
GSM8K · 30%
| # | Modell | Score | Von |
|---|---|---|---|
| 1 | 97.3 | DeepSeek | |
| 2 | 96.0 | OpenAI | |
| 3 | 94.5 | DeepSeek | |
| 4 | 94.3 | DeepSeek | |
| 5 | 93.9 | DeepSeek | |
| 6 | 93.3 | xAI | |
| 7 | 92.8 | DeepSeek | |
| 8 | 92.0 | Google DeepMind | |
| 9 | 90.2 | DeepSeek | |
| 10 | 89.0 | Google DeepMind | |
| 11 | 87.5 | Anthropic | |
| 12 | 83.9 | DeepSeek | |
| 13 | 83.1 | Alibaba (Qwen Team) | |
| 14 | 82.0 | Anthropic | |
| 15 | 77.0 | Meta AI | |
| 16 | 76.6 | OpenAI | |
| 17 | 73.8 | Meta AI | |
| 18 | 73.0 | Mistral AI | |
| 19 | 41.8 | Mistral AI |
Showing top 19 models with published data on at least one of the benchmarks above. Scores are weighted averages on a 0–100 scale.