GPT-OSS 20B.
20B GPT-OSS — single-GPU local target.
1× Nvidia GeForce RTX 4080.
Most-aggressive quantisation we have a working recommendation for. Lower precision = less VRAM = cheaper hardware, at a small accuracy cost.
Cheapest hosted endpoints.
Speed across providers.
Tokens/sec and time-to-first-token measured against the same prompt template on each provider's API.
| Provider | Tokens/sec | TTFT | Total |
|---|---|---|---|
| OpenRouter | 125.5 | — | 3983 ms |
Variants in the GPT-OSS family.
Frequently asked.
How do I run GPT-OSS 20B?
Where can I access GPT-OSS 20B?
How much does it cost to run GPT-OSS 20B?
Is GPT-OSS 20B open-source or proprietary?
Cheapest hardware per quantisation.
Each row is one quantisation tier (the same weights compressed differently). Lower precision → lower VRAM → cheaper hardware, at the cost of small accuracy loss. $/hr refreshed hourly from each provider's API.
| Quantisation | Cheapest GPU config | Total VRAM | Live $/hr | tokens/sec | |
|---|---|---|---|---|---|
|
FP16
FP16 — half precision (default)
|
64 GB | — | — | Compare → | |
|
FP8
FP8 — 8-bit float (Hopper / Blackwell)
|
32 GB | $0.023/hr | — | Compare → | |
|
INT4
INT4 — 4-bit integer (~4× VRAM saving)
|
16 GB | — | — | Compare → |
What it costs per month across providers.
Estimate your monthly bill for GPT-OSS 20B across every host that publishes per-token pricing. Slide your token volumes; the chart + table re-rank cheapest-first.
Cheapest provider on the left.
Total monthly cost — input + output tokens combined.
Bill breakdown.
Rent the GPU instead of paying per token.
For an open-weights model like GPT-OSS 20B, you can rent a GPU and serve inference yourself. The math: cheapest GPU rental × 730 hours/month + your electricity rate × power draw.
Assumes the GPU runs 24/7 at ~85% utilisation. If your traffic is bursty, you'll pay less for the API and probably more for the GPU (idle hours still cost rental). The breakeven analysis lives on the Self-host vs API breakeven tool.
How it's built.
How much it can remember.
What it can do.
Every place this model is hosted.
OpenAI API
api directPricing inferred from OpenRouter (within ~5-20% of direct API rates).