Tool · API pricing

Google: Gemini 2.0 Flash.

At 10,000,000 input + 2,000,000 output tokens per month, OpenRouter is the only listed host at $1.8/month.

Cheapest
$1.8
OpenRouter
$/M input
$0.1
per million tokens
$/M output
$0.4
per million tokens
Providers
1
with priced rows
Google: Gemini 2.0 Flash
100,000 1,000,000,000
100,000 500,000,000
Monthly bill

Cheapest provider on the left.

Total monthly cost — input + output tokens combined.

Loading...
Per provider

Bill breakdown.

Provider Monthly total
$1.8 Sign up ↗
How it works

Three steps to your monthly estimate.

  1. 01

    Pick the model.

    Use the search box to find an AI model — Claude, GPT, Llama, DeepSeek, Qwen, anything we track. The picker lists every model where at least one provider publishes per-token pricing.

  2. 02

    Estimate volume.

    Slide the monthly input + output token counts to match your expected workload. A typical chat app handles 1-10M input tokens per active user per month; an agent that re-reads context every turn can hit 100M+.

  3. 03

    Read the spread.

    The chart + table list every provider that hosts the model, sorted cheapest-first. Click a provider name to open its detail page — pricing history, throughput benchmarks, and the affiliate signup link.

FAQ

Frequently asked.

How is the monthly bill calculated?

Total = (input rate × input tokens / 1M) + (output rate × output tokens / 1M). We pull live per-token prices from each provider's official pricing page or /v1/models API and recompute on every page load — no caching beyond a brief edge TTL.

Where does the pricing data come from?

Direct from each inference provider — Anthropic, OpenAI, OpenRouter, Together AI, Fireworks AI, DeepInfra, z.ai, Groq, and a dozen others. The daily refresh job (RefreshAiModelCatalogJob) re-pulls each provider's /v1/models endpoint and updates our AiModelAccess rows.

Should I always pick the cheapest provider?

Cheapest by $/M tokens isn't always cheapest by total cost. Watch for: (1) caching discounts that aggregators like OpenRouter don't pass through fully, (2) rate-limit ceilings on the smaller hosts that force you onto a more expensive tier under load, (3) per-request latency overhead from aggregators (extra ~50ms). For low-volume or bursty workloads, the absolute cheapest is usually right. For sustained production traffic, factor in throughput + reliability.

What's the difference between OpenRouter and the model maker's direct API?

OpenRouter is an aggregator — they route your request to one of several upstream providers and add a small markup (typically 5-20%). The model maker's direct API (e.g. api.anthropic.com for Claude) gives you the bare price + access to native features like Anthropic's prompt caching or Google's context caching. Direct is cheaper at scale; OpenRouter wins when you want one key to access dozens of models.

Are input + output prices the same?

No — output tokens are typically 3-5× more expensive than input. The 'cheapest input' provider isn't always 'cheapest output'. The 'monthly bill' column accounts for both; sort by that column for the real total.

How does prompt caching affect this estimate?

Cached input tokens are usually 50-90% cheaper than fresh input. We don't currently model caching because it depends on your workload pattern (long system prompts re-used across requests benefit; chat with fresh context each turn doesn't). For high-volume single-prompt workloads, halve the input cost when comparing Anthropic/Google/OpenAI direct.

How often is this pricing refreshed?

Daily, via a scheduled background job at 4:15am UTC. Live prices show up within ~24 hours of a provider changing them on their pricing page. For breaking price drops (e.g. DeepSeek's R1 launch) we'll re-run manually.

What about fine-tuned variants?

Fine-tuned model deployments are priced separately from the base model and often have different rate structures (hourly compute + per-token blended). This tool covers the base-model token pricing only. For fine-tuned costs, check the provider's per-deployment pricing page directly.

Related tools

Keep going.