DeepSeek Coder V2 236B.
DeepSeek's MoE coding model — 236B total, 21B active.
1× AMD MI300.
Most-aggressive quantisation we have a working recommendation for. Lower precision = less VRAM = cheaper hardware, at a small accuracy cost.
Cheapest hosted endpoints.
Smaller models distilled from DeepSeek Coder V2 236B.
Lightweight student models trained to mimic DeepSeek Coder V2 236B's outputs.
Variants in the DeepSeek Coder family.
Frequently asked.
How do I run DeepSeek Coder V2 236B?
Where can I access DeepSeek Coder V2 236B?
How much does it cost to run DeepSeek Coder V2 236B?
Is DeepSeek Coder V2 236B open-source or proprietary?
Cheapest hardware per quantisation.
Each row is one quantisation tier (the same weights compressed differently). Lower precision → lower VRAM → cheaper hardware, at the cost of small accuracy loss. $/hr refreshed hourly from each provider's API.
| Quantisation | Cheapest GPU config | Total VRAM | Live $/hr | tokens/sec | |
|---|---|---|---|---|---|
|
FP16
FP16 — half precision (default)
|
768 GB | — | — | Compare → | |
|
FP8
FP8 — 8-bit float (Hopper / Blackwell)
|
384 GB | — | — | Compare → | |
|
INT4
INT4 — 4-bit integer (~4× VRAM saving)
|
192 GB | — | — | Compare → |
What it costs per month across providers.
Estimate your monthly bill for DeepSeek Coder V2 236B across every host that publishes per-token pricing. Slide your token volumes; the chart + table re-rank cheapest-first.
No priced API access rows on file for DeepSeek Coder V2 236B yet.
Rent the GPU instead of paying per token.
For an open-weights model like DeepSeek Coder V2 236B, you can rent a GPU and serve inference yourself. The math: cheapest GPU rental × 730 hours/month + your electricity rate × power draw.
Assumes the GPU runs 24/7 at ~85% utilisation. If your traffic is bursty, you'll pay less for the API and probably more for the GPU (idle hours still cost rental). The breakeven analysis lives on the Self-host vs API breakeven tool.
About DeepSeek Coder V2 236B.
DeepSeek Coder V2 236B is the largest of DeepSeek's specialised coding models. MoE architecture: 236B total, 21B active per token. Trained on 6T additional tokens of code on top of DeepSeek V2.