MMLU und MMLU-Pro kombiniert.
Beste KI-Modelle für Allgemeinwissen.
MMLU misst Breite über 57 akademische Fächer; MMLU-Pro erhöht den Schwierigkeitsgrad. Hohe Werte bedeuten: das Modell weiß viel, bevor es überhaupt reasonen muss.
Verwendete Benchmarks:
MMLU · 50%
MMLU PRO · 50%
| # | Modell | Score | Von |
|---|---|---|---|
| 1 | 89.3 | OpenAI | |
| 2 | 88.7 | OpenAI | |
| 3 | 88.7 | Anthropic | |
| 4 | 86.4 | OpenAI | |
| 5 | 86.0 | Anthropic | |
| 6 | 86.0 | Meta AI | |
| 7 | 84.0 | Mistral AI | |
| 8 | 84.0 | DeepSeek | |
| 9 | 83.5 | Google DeepMind | |
| 10 | 82.2 | DeepSeek | |
| 11 | 82.0 | Anthropic | |
| 12 | 82.0 | OpenAI | |
| 13 | 81.9 | Google DeepMind | |
| 14 | 80.9 | Meta AI | |
| 15 | 79.9 | xAI | |
| 16 | 78.6 | Alibaba (Qwen Team) | |
| 17 |
Kimi K2
open
|
78.5 | Moonshot AI |
| 18 | 78.0 | Anthropic | |
| 19 | 77.8 | Mistral AI | |
| 20 | 77.5 | Meta AI | |
| 21 | 74.6 | Cohere | |
| 22 | 73.0 | Meta AI | |
| 23 | 73.0 | Meta AI | |
| 24 | 71.9 | Google DeepMind | |
| 25 | 70.0 | DeepSeek |
Showing top 25 models with published data on at least one of the benchmarks above. Scores are weighted averages on a 0–100 scale.