by DeepSeek

DeepSeek V3.

text open weights datacenter 671B params 37B active 128K ctx MoE Quality 76.1
Cheapest input
$0.27/M
on DeepSeek API
Cheapest output
$0.89/M
on DeepInfra
Fastest
30 tok/s
on OpenRouter
Smallest GPU
2× AMD MI325
Capability snapshot

What it's best at.

Math 90.2
General knowledge 88.5
Coding 82.6
Reasoning 75.9

Scores normalised against benchmark ceilings (100 = perfect). Coloured by tier — coral 80+ frontier, lavender 65+ strong, sage 50+ solid, slate below.

Benchmarks

Published scores.

Benchmark Score Source
GPQA 59.1 official ↗
MATH 90.2 official ↗
MMLU 88.5 official ↗
MMLU-Pro 75.9 official ↗
HumanEval 82.6 official ↗
Leaderboard standing

Independent rankings.

Artificial Analysis Quality Index
70.0
Composite of reasoning + coding + tool-use benchmarks
View on Artificial Analysis ↗
Description

About DeepSeek V3.

DeepSeek V3 is DeepSeek AI's flagship MoE model — 671B parameters total, 37B activated per token, trained on 14.8T tokens for an estimated $5.5M (vastly cheaper than comparable frontier models, made possible by FP8 mixed-precision training and Multi-head Latent Attention). Benchmark performance rivals GPT-4o and Claude 3.5 Sonnet at a fraction of the inference cost. Available on Hugging Face under DeepSeek's permissive commercial license. The V3 release in late 2024 was a watershed moment for open-weight models — it proved frontier capability didn't require frontier budgets.

Architecture

How it's built.

Architecture
MoE
Mixture of Experts — 37B params active per token out of 671B total.
Trained on
14.8T tokens
22 tokens per parameter — below the Chinchilla optimum.
Knowledge cutoff
Jul 2024
178 days from cutoff to release.
Context window

How much it can remember.

128K tokens ≈ 96,000 English words
4K 32K 128K 1M
Max output per call: 8K tokens
Capabilities

What it can do.

· Vision input
· Audio input
· Video input
Function calling
Tool use
JSON mode
Streaming
Fine-tuning
All access providers

Every place this model is hosted.