API aggregators
OpenAI-compatible
US
DeepInfra.
Low-cost serverless inference for open-weight LLMs. OpenAI-compatible endpoints, no commitment.
Cheapest 12 models
Where the floor is.
Sorted cheapest-first by $/M input. Useful when you're looking for the floor before picking a model.
Loading...
At a glance
- Service type
- API aggregators
- Trust tier
- Tier 2
- Headquarters
- US
- OpenAI-compat
- Yes
- Open weights
- Yes
- Proprietary
- No
When to pick DeepInfra
Best for
- Building once and swapping models freely — same key, same endpoint shape.
- Workloads that benefit from automatic failover across upstreams.
- Anyone who wants per-token billing without managing N separate accounts.
Avoid for
- Workloads needing the absolute lowest per-token price (first-party usually wins).
- Anything requiring real-time price quotes from the original maker.
Models on DeepInfra
Pricing + measured speed + self-host alternative, one row per model. Click a column header to sort.
| Model ↕ | Maker ↕ | Access ↕ | $/M in ↕ | $/M out ↕ | Tokens/sec ↕ | TTFT ↕ | Self-host on ↕ | |
|---|---|---|---|---|---|---|---|---|
| Nemotron-3-Nano-Omni-30B-A3B-Reasoning | Nvidia | hosted inference | $0.2 | $0.8 | — | — | 1× Nvidia RTX 4000 Ada · INT4 | Open → |
| DeepSeek: DeepSeek V4 Pro | DeepSeek | hosted inference | $1.3 | $2.6 | — | — | API only | Open → |
| DeepSeek: DeepSeek V4 Flash | DeepSeek | hosted inference | $0.1 | $0.2 | — | — | API only | Open → |
| Kimi K2.6 | Moonshot AI | hosted inference | $0.75 | $3.5 | — | — | 4× AMD MI300 · INT4 | Open → |
| Xiaomi: MiMo-V2.5 | Xiaomi | hosted inference | $0.4 | $2.0 | — | — | API only | Open → |
| Xiaomi: MiMo-V2.5-Pro | Xiaomi | hosted inference | $1.0 | $3.0 | — | — | 1× Nvidia RTX 6000 Ada · INT4 | Open → |
| Qwen: Qwen3.6 35B A3B | Alibaba (Qwen Team) | hosted inference | $0.15 | $0.95 | — | — | 1× Nvidia RTX A5000 · INT4 | Open → |
| GLM-5.1 | Zhipu AI | hosted inference | $1.05 | $3.5 | — | — | 1× Nvidia RTX 4000 Ada SFF · INT4 | Open → |
| StepFun: Step 3.5 Flash | Stepfun | hosted inference | $0.09 | $0.3 | — | — | API only | Open → |
| Qwen: Qwen3.5 397B A17B | Alibaba (Qwen Team) | hosted inference | $0.49 | $3.6 | — | — | 1× AMD MI325 · INT4 | Open → |
| Google: Gemma 4 26B A4B | Google DeepMind | hosted inference | $0.07 | $0.34 | — | — | API only | Open → |
| Google: Gemma 4 31B | Google DeepMind | hosted inference | $0.13 | $0.38 | — | — | API only | Open → |
| Qwen: Qwen3.5-122B-A10B | Alibaba (Qwen Team) | hosted inference | $0.29 | $2.4 | — | — | 1× Nvidia H100 · INT4 | Open → |
| NVIDIA-Nemotron-3-Super-120B-A12B | Nvidia | hosted inference | $0.1 | $0.5 | — | — | 1× Nvidia A100 · INT4 | Open → |
| GLM-5 | Zhipu AI | hosted inference | $0.6 | $2.08 | — | — | 1× AMD MI325 · INT4 | Open → |
| MiniMax: MiniMax M2.5 | MiniMax | hosted inference | $0.15 | $1.15 | — | — | API only | Open → |
| Qwen: Qwen3 Max | Alibaba (Qwen Team) | hosted inference | $1.2 | $6.0 | — | — | 1× AMD MI300 · INT4 | Open → |
| Qwen: Qwen3 Max Thinking | Alibaba (Qwen Team) | hosted inference | $1.2 | $6.0 | — | — | API only | Open → |
| Kimi K2.5 | Moonshot AI | hosted inference | $0.45 | $2.25 | — | — | 4× AMD MI300 · INT4 | Open → |
| Z.ai: GLM 4.7 Flash | Zhipu AI | hosted inference | $0.06 | $0.4 | — | — | API only | Open → |
| DeepSeek: DeepSeek V3.2 | DeepSeek | hosted inference | $0.26 | $0.38 | — | — | API only | Open → |
| Seed-1.8 | Bytedance | hosted inference | $0.25 | $2.0 | — | — | 1× Nvidia H200 · INT4 | Open → |
| Seed-2.0-code | Bytedance | hosted inference | $0.5 | $3.0 | — | — | API only | Open → |
| ByteDance Seed: Seed-2.0-Mini | Bytedance Seed | hosted inference | $0.1 | $0.4 | — | — | API only | Open → |
| Seed-2.0-pro | Bytedance | hosted inference | $0.5 | $3.0 | — | — | API only | Open → |
| MythoMax 13B | Gryphe | hosted inference | $0.4 | $0.4 | — | — | 1× Nvidia RTX 3080 · INT4 | Open → |
| Nous: Hermes 3 405B Instruct | Nous Research | hosted inference | $1.0 | $1.0 | — | — | 1× AMD MI325 · INT4 | Open → |
| Hermes-3-Llama-3.1-70B | Nous Research | hosted inference | $0.3 | $0.3 | — | — | 1× Nvidia A40 · INT4 | Open → |
| Qwen2.5 72B Instruct | Alibaba (Qwen Team) | hosted inference | $0.36 | $0.4 | — | — | 1× Nvidia A40 · INT4 | Open → |
| Qwen: Qwen3 14B | Alibaba (Qwen Team) | hosted inference | $0.12 | $0.24 | — | — | 1× Nvidia RTX 3080 · INT4 | Open → |
| Qwen3-235B-A22B-Instruct-2507 | Alibaba (Qwen Team) | hosted inference | $0.071 | $0.1 | — | — | 1× AMD MI300 · INT4 | Open → |
| Qwen: Qwen3 235B A22B Thinking 2507 | Alibaba (Qwen Team) | hosted inference | $0.23 | $2.3 | — | — | 1× AMD MI300 · INT4 | Open → |
| Qwen: Qwen3 30B A3B | Alibaba (Qwen Team) | hosted inference | $0.09 | $0.45 | — | — | 1× Nvidia RTX 4000 Ada · INT4 | Open → |
| Qwen: Qwen3 32B | Alibaba (Qwen Team) | hosted inference | $0.08 | $0.28 | — | — | 1× Nvidia RTX 4000 Ada · INT4 | Open → |
| Qwen3-Coder-480B-A35B-Instruct-Turbo | Alibaba (Qwen Team) | hosted inference | $0.3 | $1.0 | — | — | 2× AMD MI300 · INT4 | Open → |
| Qwen: Qwen3 Next 80B A3B Instruct | Alibaba (Qwen Team) | hosted inference | $0.09 | $1.1 | — | — | 1× Nvidia A16 · INT4 | Open → |
| Qwen: Qwen3 VL 235B A22B Instruct | Alibaba (Qwen Team) | hosted inference | $0.2 | $0.88 | — | — | 1× AMD MI300 · INT4 | Open → |
| Qwen: Qwen3 VL 30B A3B Instruct | Alibaba (Qwen Team) | hosted inference | $0.15 | $0.6 | — | — | 1× Nvidia RTX 4000 Ada · INT4 | Open → |
| Qwen3.5-0.8B | Alibaba (Qwen Team) | hosted inference | $0.01 | $0.05 | — | — | 1× Nvidia P102-100 · INT4 | Open → |
| Qwen: Qwen3.5-27B | Alibaba (Qwen Team) | hosted inference | $0.26 | $2.6 | — | — | 1× Nvidia RTX 4000 Ada · INT4 | Open → |
| Qwen3.5-2B | Alibaba (Qwen Team) | hosted inference | $0.02 | $0.1 | — | — | 1× Nvidia Titan V · FP8 | Open → |
| Qwen: Qwen3.5-35B-A3B | Alibaba (Qwen Team) | hosted inference | $0.14 | $1.0 | — | — | 1× Nvidia RTX A5000 · INT4 | Open → |
| Qwen3.5-4B | Alibaba (Qwen Team) | hosted inference | $0.03 | $0.15 | — | — | 1× Nvidia Titan V · INT4 | Open → |
| Qwen: Qwen3.5-9B | Alibaba (Qwen Team) | hosted inference | $0.04 | $0.15 | — | — | 1× Nvidia GeForce RTX 2060 · INT4 | Open → |
| Qwen: Qwen3.6 27B | Alibaba (Qwen Team) | hosted inference | $0.32 | $3.2 | — | — | 1× Nvidia RTX 4000 Ada · INT4 | Open → |
| L3-8B-Lunaris-v1-Turbo | Sao10k | hosted inference | $0.04 | $0.05 | — | — | 1× Nvidia P102-100 · INT4 | Open → |
| L3.1-70B-Euryale-v2.2 | Sao10k | hosted inference | $0.85 | $0.85 | — | — | 1× Nvidia A40 · INT4 | Open → |
| Claude Haiku 4.5 | Anthropic | hosted inference | $1.0 | $5.0 | — | — | API only | Open → |
| Claude Opus 4.7 | Anthropic | hosted inference | $5.0 | $25.0 | — | — | API only | Open → |
| Claude Sonnet 4.6 | Anthropic | hosted inference | $3.0 | $15.0 | — | — | API only | Open → |
| DeepSeek: R1 0528 | DeepSeek | hosted inference | $0.5 | $2.15 | — | — | 2× AMD MI325 · INT4 | Open → |
| DeepSeek R1 Distill Llama 70B | DeepSeek | hosted inference | $0.7 | $0.8 | — | — | 1× Nvidia L40S · INT4 | Open → |
| DeepSeek V3 | DeepSeek | hosted inference | $0.32 | $0.89 | — | — | 2× AMD MI325 · INT4 | Open → |
| DeepSeek-V3-0324 | DeepSeek | hosted inference | $0.2 | $0.77 | — | — | 2× AMD MI325 · INT4 | Open → |
| DeepSeek-V3.1 | DeepSeek | hosted inference | $0.21 | $0.79 | — | — | API only | Open → |
| DeepSeek: DeepSeek V3.1 Terminus | DeepSeek | hosted inference | $0.27 | $0.95 | — | — | API only | Open → |
| Google: Gemini 2.5 Flash | Google DeepMind | hosted inference | $0.3 | $2.5 | — | — | API only | Open → |
| Gemini 2.5 Pro | Google DeepMind | hosted inference | $1.25 | $10.0 | — | — | API only | Open → |
| Google: Gemini 3.1 Flash Lite | Google DeepMind | hosted inference | $0.25 | $1.5 | — | — | API only | Open → |
| gemini-3.1-pro | Google DeepMind | hosted inference | $2.0 | $12.0 | — | — | API only | Open → |
| Google: Gemma 3 12B | Google DeepMind | hosted inference | $0.04 | $0.13 | — | — | API only | Open → |
| Gemma 3 27B It | Google DeepMind | hosted inference | $0.08 | $0.16 | — | — | 1× Nvidia RTX 4000 Ada · INT4 | Open → |
| Gemma 3 4b it | Google DeepMind | hosted inference | $0.04 | $0.08 | — | — | 1× Nvidia Titan V · INT4 | Open → |
| gemma-4-31B-it-turbo | Google DeepMind | hosted inference | $0.12 | $0.37 | — | — | 1× Nvidia RTX 4000 Ada · INT4 | Open → |
| Llama-3.2-11B-Vision-Instruct | Meta AI | hosted inference | $0.245 | $0.245 | — | — | 1× Nvidia RTX 3070 Ti · INT4 | Open → |
| Meta Llama 3.3 70B Instruct Turbo | Meta AI | hosted inference | $0.1 | $0.32 | — | — | 1× Nvidia A40 · INT4 | Open → |
| Llama 4 Maverick Instruct (17Bx128E) FP8 | Meta AI | hosted inference | $0.15 | $0.6 | — | — | 1× Nvidia GTX 1080 Ti · INT4 | Open → |
| Llama 4 Scout Instruct (17Bx16E) | Meta AI | hosted inference | $0.08 | $0.3 | — | — | 1× Nvidia GTX 1080 Ti · INT4 | Open → |
| Meta: Llama Guard 4 12B | Meta AI | hosted inference | $0.18 | $0.18 | — | — | 1× Nvidia GTX 1070 Ti · INT4 | Open → |
| Meta-Llama-3.1-70B-Instruct | Meta AI | hosted inference | $0.4 | $0.4 | — | — | 1× Nvidia A40 · INT4 | Open → |
| Meta Llama 3.1 70B Instruct Turbo | Meta AI | hosted inference | $0.4 | $0.4 | — | — | 1× Nvidia A40 · INT4 | Open → |
| Meta-Llama-3.1-8B-Instruct | Meta AI | hosted inference | $0.02 | $0.05 | — | — | 1× Nvidia P102-100 · INT4 | Open → |
| Meta Llama 3.1 8B Instruct Turbo | Meta AI | hosted inference | $0.02 | $0.03 | — | — | 1× Nvidia P102-100 · INT4 | Open → |
| Phi-4 | Microsoft | hosted inference | $0.07 | $0.14 | — | — | 1× Nvidia RTX 3080 · INT4 | Open → |
| Mistral-Nemo-Instruct-2407 | Mistral AI | hosted inference | $0.02 | $0.04 | — | — | API only | Open → |
| Mistral: Mistral Small 3 | Mistral AI | hosted inference | $0.05 | $0.08 | — | — | 1× Nvidia GeForce RTX 4080 · INT4 | Open → |
| Mistral-Small-3.2-24B-Instruct-2506 | Mistral AI | hosted inference | $0.075 | $0.2 | — | — | 1× Nvidia RTX 4060 Ti · INT4 | Open → |
| NVIDIA: Llama 3.3 Nemotron Super 49B V1.5 | Nvidia | hosted inference | $0.1 | $0.4 | — | — | 1× Nvidia Tesla V100 SXM2 32GB · INT4 | Open → |
| Nvidia Nemotron Nano 9B V2 | Nvidia | hosted inference | $0.04 | $0.16 | — | — | 1× Nvidia RTX A2000 · INT4 | Open → |
| NVIDIA: Nemotron 3 Nano 30B A3B | Nvidia | hosted inference | $0.05 | $0.2 | — | — | 1× Nvidia RTX 4000 Ada · INT4 | Open → |
| GPT-OSS 120B | OpenAI | hosted inference | $0.039 | $0.19 | — | — | 1× Nvidia H100 · INT4 | Open → |
| gpt-oss-120b-Turbo | OpenAI | hosted inference | $0.15 | $0.6 | — | — | 1× Nvidia A100 · INT4 | Open → |
| GPT-OSS 20B | OpenAI | hosted inference | $0.03 | $0.14 | — | — | 1× Nvidia GeForce RTX 4080 · INT4 | Open → |
| GLM-4.6 | Zhipu AI | hosted inference | $0.43 | $1.74 | — | — | 1× AMD MI325 · INT4 | Open → |
| GLM-4.7 | Zhipu AI | hosted inference | $0.4 | $1.75 | — | — | 1× Nvidia GTX 1660 Ti · INT4 | Open → |