Inference providers

Where to call the API.

Services that host AI models behind APIs and sell per-token access. First-party APIs, aggregators, specialty serverless, consumer chat surfaces, and hyperscaler gateways — sorted so you can pick by business model.

34 providers · 526 models hosted · 12 OpenAI-compatible · 10 China-HQ
Filter
OpenRouter
OpenAI-compat
340 models · open + proprietary · US

Unified API in front of every major frontier API + open-weight host. Single key, one billing source, transparent per-model markup.

Together AI
OpenAI-compat
177 models · open-weight · US

Serverless inference for 100+ open-weight models. Fast cold starts, per-token pricing, fine-tuning available.

Fireworks AI
OpenAI-compat
13 models · open-weight · US

Fast inference on popular open-weight models. Speculative decoding + custom kernels keep latency low. Per-token billing.

Replicate
Tier 1
8 models · open-weight · US

Per-second billing serverless model API. Strong for image / video / audio models alongside LLMs.

DeepInfra
OpenAI-compat
85 models · open-weight · US

Low-cost serverless inference for open-weight LLMs. OpenAI-compatible endpoints, no commitment.

SiliconFlow
Tier 2
4 models · open-weight · CN

China-based aggregator with strong coverage of Chinese open-weight models (GLM, Qwen, DeepSeek, MiniMax).

Anthropic API
Tier 1
14 models · proprietary only · US

Anthropic's first-party API for the Claude family. Direct billing relationship; prompt caching + batch tier discount.

OpenAI API
OpenAI-compat
66 models · proprietary only · US

OpenAI's first-party API for GPT, o-series, embedding, and image models.

Google AI Studio
Tier 1
3 models · proprietary only · US

First-party API for the Gemini family. Free tier available for testing; production via paid quota.

Mistral La Plateforme
OpenAI-compat
38 models · open + proprietary · FR

Mistral's first-party API serving both their closed (Large) and open (Small / Codestral) families.

DeepSeek Platform
OpenAI-compat
25 models · open-weight · CN

DeepSeek's first-party API for V3, R1, and the R1-Distill family. Among the lowest published per-token rates on frontier-tier models.

Moonshot AI Platform
OpenAI-compat
7 models · open + proprietary · CN

Moonshot AI's developer API for the Kimi family. Both open-weight (K2 series) and closed flagship (Moonshot v1) accessible from one endpoint.

Zhipu BigModel
Tier 1
15 models · open-weight · CN

Zhipu AI's first-party API for the GLM family. Open-weight GLM-4.5 + closed enterprise tiers.

Alibaba DashScope
Tier 1
99 models · open-weight · CN

Alibaba's first-party API for the Qwen family. Backend for Aliyun's AI services.

Tencent Cloud (Hunyuan)
Tier 1
3 models · open-weight · CN

Tencent's first-party API for the Hunyuan family.

MiniMax Platform
Tier 1
12 models · open-weight · CN

MiniMax's first-party API for the abab and MiniMax-Text families.

01.AI Platform
OpenAI-compat
2 models · open + proprietary · CN

01.AI's first-party API for the Yi family — closed Yi-Lightning + open Yi-34B variants.

Cohere
Tier 1
5 models · open + proprietary · CA

Cohere's first-party API — Command series for chat + Embed series for retrieval.

xAI
OpenAI-compat
5 models · proprietary only · US

Elon Musk's AI lab — makers of the Grok family. First-party API with OpenAI-compatible endpoints; integrates tightly with X (Twitter) as the consumer surface.

z.ai
OpenAI-compat
8 models · open + proprietary · CN

Zhipu AI's international platform for the GLM family. Same models as Zhipu BigModel but with English docs, simpler signup, and a consumer chat surface at cha...

Groq
OpenAI-compat
5 models · open-weight · US

Ultra-low-latency inference on custom LPU silicon. Open-weight LLMs at >500 tokens/sec; OpenAI-compatible API.

fal.ai
Tier 1
4 models · open-weight · US

Image and video model serverless API. Strong for Stable Diffusion / Flux variants with per-second billing.

Black Forest Labs API
Tier 1
1 models · open + proprietary · DE

Black Forest Labs' first-party API for the Flux image-generation family.

Stability AI Platform
Tier 2
1 models · open + proprietary · GB

Stability AI's first-party API for Stable Diffusion + audio/video models.

Ollama
Tier 2
44 models · open-weight · US

Local-first runtime for open-weight LLMs — run them on your own machine or rented GPU. Library indexes Llama, Kimi, GLM, Qwen, …

Kimi.com
Tier 1
4 models · open + proprietary · CN

Moonshot AI's full consumer platform — chat, file uploads, search, agents, code interpreter, image generation. Free tier with per-message limits + paid subsc...

ChatGPT
Tier 1
1 models · proprietary only · US

OpenAI's flagship consumer platform — chat, projects, custom GPTs, voice, image generation, browsing, code interpreter. Free tier + paid Plus / Pro / Enterpr...

Claude.ai
Tier 1
1 models · proprietary only · US

Anthropic's consumer platform — chat, projects, artifacts, computer use, agents, file uploads. Free tier + paid Pro / Max / Team / Enterprise tiers.

Poe
Tier 2
1 models · open + proprietary · US

Quora's multi-model platform — chat across many proprietary and open-weight bots from a single subscription, plus a creator tool for building custom bots.

X Premium+
Tier 2
1 models · proprietary only · US

X.com Premium+ subscription gives access to Grok inside the X feed. Consumer chat surface, no API access — for that use xAI's first-party API instead.

AWS Bedrock
Tier 1
4 models · open + proprietary · US

AWS's managed model gateway — Claude, Llama, Mistral, Titan, Cohere behind IAM + VPC isolation.

Google Vertex AI
Tier 1
5 models · open + proprietary · US

GCP's managed model platform — Gemini, Claude, Llama under Google IAM + the rest of the GCP stack.

Azure OpenAI Service
Tier 1
4 models · proprietary only · US

Microsoft Azure's enterprise gateway to OpenAI models under Azure IAM + compliance frameworks.

Hugging Face Inference Endpoints
Tier 1
8 models · open-weight · US

Hugging Face's managed inference for any model on the Hub. Auto-scales; backed by AWS/Azure/GCP.