by DeepSeek

DeepSeek R1 Distill Qwen 32B.

text open weights workstation 33B params 128K ctx Transformer (distilled) Quality 80.0
🧬 Distilled from DeepSeek R1 — smaller, cheaper to run, similar reasoning style.
Cheapest input
$0.29/M
on DeepSeek Platform
Cheapest output
$0.29/M
on DeepSeek Platform
Fastest
21 tok/s
on OpenRouter
Smallest GPU
1× Nvidia RTX A5000
Capability snapshot

What it's best at.

Math 94.3
Coding 80.0

Scores normalised against benchmark ceilings (100 = perfect). Coloured by tier — coral 80+ frontier, lavender 65+ strong, sage 50+ solid, slate below.

Benchmarks

Published scores.

Benchmark Score Source
MATH 94.3 official ↗
HumanEval 80.0 official ↗
Description

About DeepSeek R1 Distill Qwen 32B.

DeepSeek R1 Distill Qwen 32B uses Qwen 2.5 32B as the base, post-trained on R1's reasoning traces. Sweet spot for self-hosted reasoning — runs on a single H100 at FP16 or a single RTX 5090/A40 at INT4. One of the most popular reasoning models on Hugging Face for cost-sensitive deployment.

Architecture

How it's built.

Architecture
Transformer (distilled)
Knowledge cutoff
Jul 2024
203 days from cutoff to release.
Context window

How much it can remember.

128K tokens ≈ 96,000 English words
4K 32K 128K 1M
Max output per call: 33K tokens
Capabilities

What it can do.

· Vision input
· Audio input
· Video input
· Function calling
· Tool use
· JSON mode
Streaming
Fine-tuning