DeepSeek R1.
DeepSeek's reasoning model — RL-trained, frontier-class, MIT-licensed.
2× AMD MI325.
Most-aggressive quantisation we have a working recommendation for. Lower precision = less VRAM = cheaper hardware, at a small accuracy cost.
Cheapest hosted endpoints.
What it's best at.
Speed across providers.
Tokens/sec and time-to-first-token measured against the same prompt template on each provider's API.
| Provider | Tokens/sec | TTFT | Total |
|---|---|---|---|
| OpenRouter | 21.8 | 13392 ms | 18065 ms |
Smaller models distilled from DeepSeek R1.
Lightweight student models trained to mimic DeepSeek R1's outputs.
70B Llama distilled from DeepSeek R1's reasoning traces.
32B Qwen base distilled from DeepSeek R1.
14B distilled R1 — laptop-friendly reasoning.
7B distilled R1 — runs on any modern GPU.
Tiny distilled R1 — phone / browser deployable.
Variants in the DeepSeek family.
DeepSeek's flagship MoE — 671B total, 37B active, frontier-class.
70B Llama distilled from DeepSeek R1's reasoning traces.
32B Qwen base distilled from DeepSeek R1.
14B distilled R1 — laptop-friendly reasoning.
7B distilled R1 — runs on any modern GPU.
Tiny distilled R1 — phone / browser deployable.
Frequently asked.
How do I run DeepSeek R1?
Where can I access DeepSeek R1?
How much does it cost to run DeepSeek R1?
Is DeepSeek R1 open-source or proprietary?
Cheapest hardware per quantisation.
Each row is one quantisation tier (the same weights compressed differently). Lower precision → lower VRAM → cheaper hardware, at the cost of small accuracy loss. $/hr refreshed hourly from each provider's API.
| Quantisation | Cheapest GPU config | Total VRAM | Live $/hr | tokens/sec | |
|---|---|---|---|---|---|
|
FP16
FP16 — half precision (default)
|
2048 GB | — | — | Compare → | |
|
FP8
FP8 — 8-bit float (Hopper / Blackwell)
|
1024 GB | — | — | Compare → | |
|
INT8
INT8 — 8-bit integer
|
640 GB | $3.84/hr | — | Compare → | |
|
INT4
INT4 — 4-bit integer (~4× VRAM saving)
|
512 GB | — | — | Compare → |
What it costs per month across providers.
Estimate your monthly bill for DeepSeek R1 across every host that publishes per-token pricing. Slide your token volumes; the chart + table re-rank cheapest-first.
Cheapest provider on the left.
Total monthly cost — input + output tokens combined.
Bill breakdown.
Rent the GPU instead of paying per token.
For an open-weights model like DeepSeek R1, you can rent a GPU and serve inference yourself. The math: cheapest GPU rental × 730 hours/month + your electricity rate × power draw.
Assumes the GPU runs 24/7 at ~85% utilisation. If your traffic is bursty, you'll pay less for the API and probably more for the GPU (idle hours still cost rental). The breakeven analysis lives on the Self-host vs API breakeven tool.
What it's best at.
Scores normalised against benchmark ceilings (100 = perfect). Coloured by tier — coral 80+ frontier, lavender 65+ strong, sage 50+ solid, slate below.
Published scores.
| Benchmark | Score | Source |
|---|---|---|
| GPQA | 71.5 | official ↗ |
| MATH | 97.3 | official ↗ |
| MMLU-Pro | 84.0 | official ↗ |
| HumanEval | 90.0 | official ↗ |
Independent rankings.
About DeepSeek R1.
DeepSeek R1 is DeepSeek AI's reasoning model — trained primarily with reinforcement learning (no SFT bootstrap) on top of DeepSeek V3. Pioneers thinking aloud style chain-of-thought; rivals OpenAI's o1 on math and coding reasoning benchmarks. The January 2025 launch sent shockwaves through the AI industry because the model was released under the unrestricted MIT license, with the training methodology fully documented. Also drove the famous DeepSeek shock sell-off of US tech stocks. 671B/37B MoE under the hood; quantised distilled versions (1.5B to 70B) also released and widely deployed for cost-sensitive reasoning workloads.