MMLU y MMLU-Pro combinados.
Mejores modelos de IA para conocimiento general.
MMLU mide amplitud en 57 materias académicas; MMLU-Pro endurece el nivel sobre los mismos temas. Una puntuación alta significa que el modelo sabe mucho antes de razonar.
Benchmarks usados:
MMLU · 50%
MMLU PRO · 50%
| # | Modelo | Puntuación | Desde |
|---|---|---|---|
| 1 | 89.3 | OpenAI | |
| 2 | 88.7 | OpenAI | |
| 3 | 88.7 | Anthropic | |
| 4 | 86.4 | OpenAI | |
| 5 | 86.0 | Anthropic | |
| 6 | 86.0 | Meta AI | |
| 7 | 84.0 | Mistral AI | |
| 8 | 84.0 | DeepSeek | |
| 9 | 83.5 | Google DeepMind | |
| 10 | 82.2 | DeepSeek | |
| 11 | 82.0 | Anthropic | |
| 12 | 82.0 | OpenAI | |
| 13 | 81.9 | Google DeepMind | |
| 14 | 80.9 | Meta AI | |
| 15 | 79.9 | xAI | |
| 16 | 78.6 | Alibaba (Qwen Team) | |
| 17 |
Kimi K2
open
|
78.5 | Moonshot AI |
| 18 | 78.0 | Anthropic | |
| 19 | 77.8 | Mistral AI | |
| 20 | 77.5 | Meta AI | |
| 21 | 74.6 | Cohere | |
| 22 | 73.0 | Meta AI | |
| 23 | 73.0 | Meta AI | |
| 24 | 71.9 | Google DeepMind | |
| 25 | 70.0 | DeepSeek |
Showing top 25 models with published data on at least one of the benchmarks above. Scores are weighted averages on a 0–100 scale.