by DeepSeek

DeepSeek R1 Distill Qwen 1.5B.

text open weights edge 2B params 128K ctx Transformer (distilled)
🧬 Distilled from DeepSeek R1 — smaller, cheaper to run, similar reasoning style.
Cheapest input
$0.18/M
on Together AI
Cheapest output
$0.18/M
on Together AI
Smallest GPU
1× Nvidia Titan V
Capability snapshot

What it's best at.

Math 83.9

Scores normalised against benchmark ceilings (100 = perfect). Coloured by tier — coral 80+ frontier, lavender 65+ strong, sage 50+ solid, slate below.

Benchmarks

Published scores.

Benchmark Score Source
MATH 83.9 official ↗
Description

About DeepSeek R1 Distill Qwen 1.5B.

DeepSeek R1 Distill Qwen 1.5B is the smallest of the R1 distill family — designed for mobile + browser (WebGPU) deployment. Surprisingly competitive on math (85+ MATH score) for its size. Runs in 4 GB of RAM at INT4.

Architecture

How it's built.

Architecture
Transformer (distilled)
Context window

How much it can remember.

128K tokens ≈ 96,000 English words
4K 32K 128K 1M
Max output per call: 33K tokens
Capabilities

What it can do.

· Vision input
· Audio input
· Video input
· Function calling
· Tool use
· JSON mode
Streaming
Fine-tuning