OpenAI: GPT-5.4.
At 10,000,000 input + 2,000,000 output tokens per month, OpenAI API is the only listed host at $55.0/month.
Cheapest provider on the left.
Total monthly cost — input + output tokens combined.
Bill breakdown.
Three steps to your monthly estimate.
-
01
Pick the model.
Use the search box to find an AI model — Claude, GPT, Llama, DeepSeek, Qwen, anything we track. The picker lists every model where at least one provider publishes per-token pricing.
-
02
Estimate volume.
Slide the monthly input + output token counts to match your expected workload. A typical chat app handles 1-10M input tokens per active user per month; an agent that re-reads context every turn can hit 100M+.
-
03
Read the spread.
The chart + table list every provider that hosts the model, sorted cheapest-first. Click a provider name to open its detail page — pricing history, throughput benchmarks, and the affiliate signup link.
Frequently asked.
How is the monthly bill calculated?
Total = (input rate × input tokens / 1M) + (output rate × output tokens / 1M). We pull live per-token prices from each provider's official pricing page or /v1/models API and recompute on every page load — no caching beyond a brief edge TTL.
Where does the pricing data come from?
Direct from each inference provider — Anthropic, OpenAI, OpenRouter, Together AI, Fireworks AI, DeepInfra, z.ai, Groq, and a dozen others. The daily refresh job (RefreshAiModelCatalogJob) re-pulls each provider's /v1/models endpoint and updates our AiModelAccess rows.
Should I always pick the cheapest provider?
Cheapest by $/M tokens isn't always cheapest by total cost. Watch for: (1) caching discounts that aggregators like OpenRouter don't pass through fully, (2) rate-limit ceilings on the smaller hosts that force you onto a more expensive tier under load, (3) per-request latency overhead from aggregators (extra ~50ms). For low-volume or bursty workloads, the absolute cheapest is usually right. For sustained production traffic, factor in throughput + reliability.
What's the difference between OpenRouter and the model maker's direct API?
OpenRouter is an aggregator — they route your request to one of several upstream providers and add a small markup (typically 5-20%). The model maker's direct API (e.g. api.anthropic.com for Claude) gives you the bare price + access to native features like Anthropic's prompt caching or Google's context caching. Direct is cheaper at scale; OpenRouter wins when you want one key to access dozens of models.
Are input + output prices the same?
No — output tokens are typically 3-5× more expensive than input. The 'cheapest input' provider isn't always 'cheapest output'. The 'monthly bill' column accounts for both; sort by that column for the real total.
How does prompt caching affect this estimate?
Cached input tokens are usually 50-90% cheaper than fresh input. We don't currently model caching because it depends on your workload pattern (long system prompts re-used across requests benefit; chat with fresh context each turn doesn't). For high-volume single-prompt workloads, halve the input cost when comparing Anthropic/Google/OpenAI direct.
How often is this pricing refreshed?
Daily, via a scheduled background job at 4:15am UTC. Live prices show up within ~24 hours of a provider changing them on their pricing page. For breaking price drops (e.g. DeepSeek's R1 launch) we'll re-run manually.
What about fine-tuned variants?
Fine-tuned model deployments are priced separately from the base model and often have different rate structures (hourly compute + per-token blended). This tool covers the base-model token pricing only. For fine-tuned costs, check the provider's per-deployment pricing page directly.