API aggregators OpenAI-compatible US

DeepInfra.

Low-cost serverless inference for open-weight LLMs. OpenAI-compatible endpoints, no commitment.

Cheapest 12 models

Where the floor is.

Sorted cheapest-first by $/M input. Useful when you're looking for the floor before picking a model.

Loading...

At a glance

Service type
API aggregators
Trust tier
Tier 2
Headquarters
US
OpenAI-compat
Yes
Open weights
Yes
Proprietary
No

When to pick DeepInfra

Best for

  • Building once and swapping models freely — same key, same endpoint shape.
  • Workloads that benefit from automatic failover across upstreams.
  • Anyone who wants per-token billing without managing N separate accounts.

Avoid for

  • Workloads needing the absolute lowest per-token price (first-party usually wins).
  • Anything requiring real-time price quotes from the original maker.

Models on DeepInfra

Pricing + measured speed + self-host alternative, one row per model. Click a column header to sort.

85 models · 0 benchmarked
Model ↕ Maker ↕ Access ↕ $/M in ↕ $/M out ↕ Tokens/sec ↕ TTFT ↕ Self-host on ↕
Nemotron-3-Nano-Omni-30B-A3B-Reasoning Nvidia hosted inference $0.2 $0.8 1× Nvidia RTX 4000 Ada · INT4 Open →
DeepSeek: DeepSeek V4 Pro DeepSeek hosted inference $1.3 $2.6 API only Open →
DeepSeek: DeepSeek V4 Flash DeepSeek hosted inference $0.1 $0.2 API only Open →
Kimi K2.6 Moonshot AI hosted inference $0.75 $3.5 4× AMD MI300 · INT4 Open →
Xiaomi: MiMo-V2.5 Xiaomi hosted inference $0.4 $2.0 API only Open →
Xiaomi: MiMo-V2.5-Pro Xiaomi hosted inference $1.0 $3.0 1× Nvidia RTX 6000 Ada · INT4 Open →
Qwen: Qwen3.6 35B A3B Alibaba (Qwen Team) hosted inference $0.15 $0.95 1× Nvidia RTX A5000 · INT4 Open →
GLM-5.1 Zhipu AI hosted inference $1.05 $3.5 1× Nvidia RTX 4000 Ada SFF · INT4 Open →
StepFun: Step 3.5 Flash Stepfun hosted inference $0.09 $0.3 API only Open →
Qwen: Qwen3.5 397B A17B Alibaba (Qwen Team) hosted inference $0.49 $3.6 1× AMD MI325 · INT4 Open →
Google: Gemma 4 26B A4B Google DeepMind hosted inference $0.07 $0.34 API only Open →
Google: Gemma 4 31B Google DeepMind hosted inference $0.13 $0.38 API only Open →
Qwen: Qwen3.5-122B-A10B Alibaba (Qwen Team) hosted inference $0.29 $2.4 1× Nvidia H100 · INT4 Open →
NVIDIA-Nemotron-3-Super-120B-A12B Nvidia hosted inference $0.1 $0.5 1× Nvidia A100 · INT4 Open →
GLM-5 Zhipu AI hosted inference $0.6 $2.08 1× AMD MI325 · INT4 Open →
MiniMax: MiniMax M2.5 MiniMax hosted inference $0.15 $1.15 API only Open →
Qwen: Qwen3 Max Alibaba (Qwen Team) hosted inference $1.2 $6.0 1× AMD MI300 · INT4 Open →
Qwen: Qwen3 Max Thinking Alibaba (Qwen Team) hosted inference $1.2 $6.0 API only Open →
Kimi K2.5 Moonshot AI hosted inference $0.45 $2.25 4× AMD MI300 · INT4 Open →
Z.ai: GLM 4.7 Flash Zhipu AI hosted inference $0.06 $0.4 API only Open →
DeepSeek: DeepSeek V3.2 DeepSeek hosted inference $0.26 $0.38 API only Open →
Seed-1.8 Bytedance hosted inference $0.25 $2.0 1× Nvidia H200 · INT4 Open →
Seed-2.0-code Bytedance hosted inference $0.5 $3.0 API only Open →
ByteDance Seed: Seed-2.0-Mini Bytedance Seed hosted inference $0.1 $0.4 API only Open →
Seed-2.0-pro Bytedance hosted inference $0.5 $3.0 API only Open →
MythoMax 13B Gryphe hosted inference $0.4 $0.4 1× Nvidia RTX 3080 · INT4 Open →
Nous: Hermes 3 405B Instruct Nous Research hosted inference $1.0 $1.0 1× AMD MI325 · INT4 Open →
Hermes-3-Llama-3.1-70B Nous Research hosted inference $0.3 $0.3 1× Nvidia A40 · INT4 Open →
Qwen2.5 72B Instruct Alibaba (Qwen Team) hosted inference $0.36 $0.4 1× Nvidia A40 · INT4 Open →
Qwen: Qwen3 14B Alibaba (Qwen Team) hosted inference $0.12 $0.24 1× Nvidia RTX 3080 · INT4 Open →
Qwen3-235B-A22B-Instruct-2507 Alibaba (Qwen Team) hosted inference $0.071 $0.1 1× AMD MI300 · INT4 Open →
Qwen: Qwen3 235B A22B Thinking 2507 Alibaba (Qwen Team) hosted inference $0.23 $2.3 1× AMD MI300 · INT4 Open →
Qwen: Qwen3 30B A3B Alibaba (Qwen Team) hosted inference $0.09 $0.45 1× Nvidia RTX 4000 Ada · INT4 Open →
Qwen: Qwen3 32B Alibaba (Qwen Team) hosted inference $0.08 $0.28 1× Nvidia RTX 4000 Ada · INT4 Open →
Qwen3-Coder-480B-A35B-Instruct-Turbo Alibaba (Qwen Team) hosted inference $0.3 $1.0 2× AMD MI300 · INT4 Open →
Qwen: Qwen3 Next 80B A3B Instruct Alibaba (Qwen Team) hosted inference $0.09 $1.1 1× Nvidia A16 · INT4 Open →
Qwen: Qwen3 VL 235B A22B Instruct Alibaba (Qwen Team) hosted inference $0.2 $0.88 1× AMD MI300 · INT4 Open →
Qwen: Qwen3 VL 30B A3B Instruct Alibaba (Qwen Team) hosted inference $0.15 $0.6 1× Nvidia RTX 4000 Ada · INT4 Open →
Qwen3.5-0.8B Alibaba (Qwen Team) hosted inference $0.01 $0.05 1× Nvidia P102-100 · INT4 Open →
Qwen: Qwen3.5-27B Alibaba (Qwen Team) hosted inference $0.26 $2.6 1× Nvidia RTX 4000 Ada · INT4 Open →
Qwen3.5-2B Alibaba (Qwen Team) hosted inference $0.02 $0.1 1× Nvidia Titan V · FP8 Open →
Qwen: Qwen3.5-35B-A3B Alibaba (Qwen Team) hosted inference $0.14 $1.0 1× Nvidia RTX A5000 · INT4 Open →
Qwen3.5-4B Alibaba (Qwen Team) hosted inference $0.03 $0.15 1× Nvidia Titan V · INT4 Open →
Qwen: Qwen3.5-9B Alibaba (Qwen Team) hosted inference $0.04 $0.15 1× Nvidia GeForce RTX 2060 · INT4 Open →
Qwen: Qwen3.6 27B Alibaba (Qwen Team) hosted inference $0.32 $3.2 1× Nvidia RTX 4000 Ada · INT4 Open →
L3-8B-Lunaris-v1-Turbo Sao10k hosted inference $0.04 $0.05 1× Nvidia P102-100 · INT4 Open →
L3.1-70B-Euryale-v2.2 Sao10k hosted inference $0.85 $0.85 1× Nvidia A40 · INT4 Open →
Claude Haiku 4.5 Anthropic hosted inference $1.0 $5.0 API only Open →
Claude Opus 4.7 Anthropic hosted inference $5.0 $25.0 API only Open →
Claude Sonnet 4.6 Anthropic hosted inference $3.0 $15.0 API only Open →
DeepSeek: R1 0528 DeepSeek hosted inference $0.5 $2.15 2× AMD MI325 · INT4 Open →
DeepSeek R1 Distill Llama 70B DeepSeek hosted inference $0.7 $0.8 1× Nvidia L40S · INT4 Open →
DeepSeek V3 DeepSeek hosted inference $0.32 $0.89 2× AMD MI325 · INT4 Open →
DeepSeek-V3-0324 DeepSeek hosted inference $0.2 $0.77 2× AMD MI325 · INT4 Open →
DeepSeek-V3.1 DeepSeek hosted inference $0.21 $0.79 API only Open →
DeepSeek: DeepSeek V3.1 Terminus DeepSeek hosted inference $0.27 $0.95 API only Open →
Google: Gemini 2.5 Flash Google DeepMind hosted inference $0.3 $2.5 API only Open →
Gemini 2.5 Pro Google DeepMind hosted inference $1.25 $10.0 API only Open →
Google: Gemini 3.1 Flash Lite Google DeepMind hosted inference $0.25 $1.5 API only Open →
gemini-3.1-pro Google DeepMind hosted inference $2.0 $12.0 API only Open →
Google: Gemma 3 12B Google DeepMind hosted inference $0.04 $0.13 API only Open →
Gemma 3 27B It Google DeepMind hosted inference $0.08 $0.16 1× Nvidia RTX 4000 Ada · INT4 Open →
Gemma 3 4b it Google DeepMind hosted inference $0.04 $0.08 1× Nvidia Titan V · INT4 Open →
gemma-4-31B-it-turbo Google DeepMind hosted inference $0.12 $0.37 1× Nvidia RTX 4000 Ada · INT4 Open →
Llama-3.2-11B-Vision-Instruct Meta AI hosted inference $0.245 $0.245 1× Nvidia RTX 3070 Ti · INT4 Open →
Meta Llama 3.3 70B Instruct Turbo Meta AI hosted inference $0.1 $0.32 1× Nvidia A40 · INT4 Open →
Llama 4 Maverick Instruct (17Bx128E) FP8 Meta AI hosted inference $0.15 $0.6 1× Nvidia GTX 1080 Ti · INT4 Open →
Llama 4 Scout Instruct (17Bx16E) Meta AI hosted inference $0.08 $0.3 1× Nvidia GTX 1080 Ti · INT4 Open →
Meta: Llama Guard 4 12B Meta AI hosted inference $0.18 $0.18 1× Nvidia GTX 1070 Ti · INT4 Open →
Meta-Llama-3.1-70B-Instruct Meta AI hosted inference $0.4 $0.4 1× Nvidia A40 · INT4 Open →
Meta Llama 3.1 70B Instruct Turbo Meta AI hosted inference $0.4 $0.4 1× Nvidia A40 · INT4 Open →
Meta-Llama-3.1-8B-Instruct Meta AI hosted inference $0.02 $0.05 1× Nvidia P102-100 · INT4 Open →
Meta Llama 3.1 8B Instruct Turbo Meta AI hosted inference $0.02 $0.03 1× Nvidia P102-100 · INT4 Open →
Phi-4 Microsoft hosted inference $0.07 $0.14 1× Nvidia RTX 3080 · INT4 Open →
Mistral-Nemo-Instruct-2407 Mistral AI hosted inference $0.02 $0.04 API only Open →
Mistral: Mistral Small 3 Mistral AI hosted inference $0.05 $0.08 1× Nvidia GeForce RTX 4080 · INT4 Open →
Mistral-Small-3.2-24B-Instruct-2506 Mistral AI hosted inference $0.075 $0.2 1× Nvidia RTX 4060 Ti · INT4 Open →
NVIDIA: Llama 3.3 Nemotron Super 49B V1.5 Nvidia hosted inference $0.1 $0.4 1× Nvidia Tesla V100 SXM2 32GB · INT4 Open →
Nvidia Nemotron Nano 9B V2 Nvidia hosted inference $0.04 $0.16 1× Nvidia RTX A2000 · INT4 Open →
NVIDIA: Nemotron 3 Nano 30B A3B Nvidia hosted inference $0.05 $0.2 1× Nvidia RTX 4000 Ada · INT4 Open →
GPT-OSS 120B OpenAI hosted inference $0.039 $0.19 1× Nvidia H100 · INT4 Open →
gpt-oss-120b-Turbo OpenAI hosted inference $0.15 $0.6 1× Nvidia A100 · INT4 Open →
GPT-OSS 20B OpenAI hosted inference $0.03 $0.14 1× Nvidia GeForce RTX 4080 · INT4 Open →
GLM-4.6 Zhipu AI hosted inference $0.43 $1.74 1× AMD MI325 · INT4 Open →
GLM-4.7 Zhipu AI hosted inference $0.4 $1.75 1× Nvidia GTX 1660 Ti · INT4 Open →