MMLU and MMLU-Pro combined.

Best AI models for general knowledge.

MMLU measures breadth across 57 academic subjects; MMLU-Pro raises the bar on the same domains. A high score means the model knows a lot before it has to reason.

Benchmarks used: MMLU · 50% MMLU PRO · 50%

#	Model	Score	From
1	GPT-5 closed	89.3	OpenAI
2	GPT-4o closed	88.7	OpenAI
3	Claude 3.5 Sonnet closed	88.7	Anthropic
4	GPT-4 Turbo closed	86.4	OpenAI
5	Claude Opus 4.7 closed	86.0	Anthropic
6	Llama 3.1 70B open	86.0	Meta AI
7	Mistral Large 2 open	84.0	Mistral AI
8	DeepSeek R1 open	84.0	DeepSeek
9	Gemini 2.5 Pro closed	83.5	Google DeepMind
10	DeepSeek V3 open	82.2	DeepSeek
11	Claude Sonnet 4.6 closed	82.0	Anthropic
12	GPT-4o Mini closed	82.0	OpenAI
13	Gemini 1.5 Pro closed	81.9	Google DeepMind
14	Llama 3.1 405B open	80.9	Meta AI
15	Grok 3 closed	79.9	xAI
16	Qwen 2.5 72B open	78.6	Alibaba (Qwen Team)
17	Kimi K2 open	78.5	Moonshot AI
18	Claude Haiku 4.5 closed	78.0	Anthropic
19	Mixtral 8x22B open	77.8	Mistral AI
20	Llama 3.3 70B open	77.5	Meta AI
21	Command R+ open	74.6	Cohere
22	Llama 3.1 8B open	73.0	Meta AI
23	Llama 3.2 11B Vision open	73.0	Meta AI
24	Gemma 3 27B open	71.9	Google DeepMind
25	DeepSeek R1 Distill Llama 70B open	70.0	DeepSeek

Showing top 25 models with published data on at least one of the benchmarks above. Scores are weighted averages on a 0–100 scale.

AI model leaderboards

More leaderboards.

Best AI models for coding → Best AI models for reasoning → Best AI models for math → Best AI models for instruction-following → Best AI models for vision → Cheapest capable AI models →