by DeepSeek

DeepSeek R1 Distill Qwen 7B.

text open weights edge 8B params 128K ctx Transformer (distilled)
🧬 Distilled from DeepSeek R1 — smaller, cheaper to run, similar reasoning style.
Smallest GPU
1× Nvidia P102-100
Capability snapshot

What it's best at.

Math 92.8

Scores normalised against benchmark ceilings (100 = perfect). Coloured by tier — coral 80+ frontier, lavender 65+ strong, sage 50+ solid, slate below.

Benchmarks

Published scores.

Benchmark Score Source
MATH 92.8 official ↗
Description

About DeepSeek R1 Distill Qwen 7B.

DeepSeek R1 Distill Qwen 7B runs on any GPU with ≥16 GB VRAM (RTX 3060 / 4060 Ti / 4070). Still hits 91+ on MATH despite the size. Default open-weight reasoning model for laptop / edge deployments.

Architecture

How it's built.

Architecture
Transformer (distilled)
Context window

How much it can remember.

128K tokens ≈ 96,000 English words
4K 32K 128K 1M
Max output per call: 33K tokens
Capabilities

What it can do.

· Vision input
· Audio input
· Video input
· Function calling
· Tool use
· JSON mode
Streaming
Fine-tuning