by xAI

Grok 3.

multimodal closed 1M ctx Transformer Quality 84.0
Cheapest input
$3.0/M
on xAI API
Cheapest output
$15.0/M
on xAI API
Fastest
112 tok/s
on OpenRouter
Hosted equiv.
~$5.4/hr
@ 100 tok/s on xAI API
Capability snapshot

What it's best at.

Math 93.3
Coding 88.4
Graduate-level science 84.6
Reasoning 79.9

Scores normalised against benchmark ceilings (100 = perfect). Coloured by tier — coral 80+ frontier, lavender 65+ strong, sage 50+ solid, slate below.

Benchmarks

Published scores.

Benchmark Score Source
GPQA 84.6 official ↗
MATH 93.3 official ↗
MMLU-Pro 79.9 official ↗
HumanEval 88.4 official ↗
Leaderboard standing

Independent rankings.

Artificial Analysis Quality Index
73.0
Composite of reasoning + coding + tool-use benchmarks
View on Artificial Analysis ↗
Description

About Grok 3.

Grok 3 is xAI's frontier model, released February 2025 from the Memphis Colossus supercluster (~200,000 H100s). Notable for native real-time grounding via X (Twitter) data and the built-in DeepSearch agent. Reasoning mode (Think) matches OpenAI o3-mini and Claude Opus on math and coding benchmarks. Available via the X subscription tiers (Premium+ for app access, separate SuperGrok for high-volume API access) and the xAI API for enterprise.

Architecture

How it's built.

Architecture
Transformer
Knowledge cutoff
Jan 2025
47 days from cutoff to release.
Context window

How much it can remember.

1M tokens ≈ 750,000 English words
4K 32K 128K 1M
Max output per call: 8K tokens
Capabilities

What it can do.

Vision input
· Audio input
· Video input
Function calling
Tool use
JSON mode
Streaming
· Fine-tuning