DeepSeek.

DeepSeek's flagship MoE — 671B total, 37B active, frontier-class.

DeepSeek R1

DeepSeek's reasoning model — RL-trained, frontier-class, MIT-licensed.

DeepSeek R1 Distill Llama 70B

70B

70B Llama distilled from DeepSeek R1's reasoning traces.

DeepSeek R1 Distill Qwen 32B

33B

32B Qwen base distilled from DeepSeek R1.

DeepSeek R1 Distill Qwen 14B

15B

14B distilled R1 — laptop-friendly reasoning.

DeepSeek R1 Distill Qwen 7B

7B distilled R1 — runs on any modern GPU.

DeepSeek R1 Distill Qwen 1.5B

by DeepSeek · DeepSeek Coder · 128,000 ctx

Tiny distilled R1 — phone / browser deployable.

DeepSeek Coder V2 236B

236B

DeepSeek's MoE coding model — 236B total, 21B active.

DeepSeek Coder V2 Lite

16B

by DeepSeek · DeepSeek Coder · 128,000 ctx

16B MoE / 2.4B active — laptop-class coder.

DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous vers...

DeepSeek: DeepSeek V3 0324

DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team...

DeepSeek: DeepSeek V3.1

DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes via prom...

DeepSeek: DeepSeek V3.1 Terminus

DeepSeek-V3.1 Terminus is an update to [DeepSeek V3.1](/deepseek/deepseek-chat-v3.1) that maintains the model's original capabilities whi...

DeepSeek: DeepSeek V3.2

by DeepSeek · 131,072 ctx

DeepSeek-V3.2 is a large language model designed to harmonize high computational efficiency with strong reasoning and agentic tool-use pe...

DeepSeek: DeepSeek V3.2 Exp

DeepSeek-V3.2-Exp is an experimental large language model released by DeepSeek as an intermediate step between V3.1 and future architectu...

DeepSeek: DeepSeek V3.2 Speciale

DeepSeek-V3.2-Speciale is a high-compute variant of DeepSeek-V3.2 optimized for maximum reasoning and agentic performance. It builds on D...

DeepSeek: DeepSeek V4 Flash

by DeepSeek · 1,048,576 ctx

DeepSeek V4 Flash is an efficiency-optimized Mixture-of-Experts model from DeepSeek with 284B total parameters and 13B activated paramete...

DeepSeek: DeepSeek V4 Pro

by DeepSeek · 1,048,576 ctx

DeepSeek V4 Pro is a large-scale Mixture-of-Experts model from DeepSeek with 1.6T total parameters and 49B activated parameters, supporti...

Deepseek OCR 2

by DeepSeek · 8,192 ctx

DeepSeek: R1 0528

May 28th update to the [original DeepSeek R1](/deepseek/deepseek-r1) Performance on par with [OpenAI o1](/openai/o1), but open-sourced an...

DeepSeek-V3-0324

DeepSeek-V3-0324, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token, an impro...

DeepSeek-V3.1