MATH y GSM8K.
Mejores modelos de IA para matemáticas.
MATH (problemas de competición, pruebas formales) pesa más; GSM8K (problemas escolares) marca el suelo. Los modelos que ganan ambos manejan álgebra, cálculo y aritmética en cadena.
Benchmarks usados:
MATH · 70%
GSM8K · 30%
| # | Modelo | Puntuación | Desde |
|---|---|---|---|
| 1 | 97.3 | DeepSeek | |
| 2 | 96.0 | OpenAI | |
| 3 | 94.5 | DeepSeek | |
| 4 | 94.3 | DeepSeek | |
| 5 | 93.9 | DeepSeek | |
| 6 | 93.3 | xAI | |
| 7 | 92.8 | DeepSeek | |
| 8 | 92.0 | Google DeepMind | |
| 9 | 90.2 | DeepSeek | |
| 10 | 89.0 | Google DeepMind | |
| 11 | 87.5 | Anthropic | |
| 12 | 83.9 | DeepSeek | |
| 13 | 83.1 | Alibaba (Qwen Team) | |
| 14 | 82.0 | Anthropic | |
| 15 | 77.0 | Meta AI | |
| 16 | 76.6 | OpenAI | |
| 17 | 73.8 | Meta AI | |
| 18 | 73.0 | Mistral AI | |
| 19 | 41.8 | Mistral AI |
Showing top 19 models with published data on at least one of the benchmarks above. Scores are weighted averages on a 0–100 scale.
Rankings de modelos de IA