by DeepSeek

DeepSeek R1 Distill Qwen 14B.

text open weights laptop+ 15B params 128K ctx Transformer (distilled)
🧬 Distilled from DeepSeek R1 — smaller, cheaper to run, similar reasoning style.
Cheapest input
$1.6/M
on Together AI
Cheapest output
$1.6/M
on Together AI
Smallest GPU
1× Nvidia RTX 3080
Capability snapshot

What it's best at.

Math 93.9

Scores normalised against benchmark ceilings (100 = perfect). Coloured by tier — coral 80+ frontier, lavender 65+ strong, sage 50+ solid, slate below.

Benchmarks

Published scores.

Benchmark Score Source
MATH 93.9 official ↗
Description

About DeepSeek R1 Distill Qwen 14B.

DeepSeek R1 Distill Qwen 14B fits on a single RTX 4090 or RTX 5090 at FP16 — full reasoning capability on consumer hardware. Best size for hobbyist reasoning experimentation and on-prem workloads where a single workstation card is the budget.

Architecture

How it's built.

Architecture
Transformer (distilled)
Context window

How much it can remember.

128K tokens ≈ 96,000 English words
4K 32K 128K 1M
Max output per call: 33K tokens
Capabilities

What it can do.

· Vision input
· Audio input
· Video input
· Function calling
· Tool use
· JSON mode
Streaming
Fine-tuning