Where to call the API.
Services that host AI models behind APIs and sell per-token access. First-party APIs, aggregators, specialty serverless, consumer chat surfaces, and hyperscaler gateways — sorted so you can pick by business model.
Unified API in front of every major frontier API + open-weight host. Single key, one billing source, transparent per-model markup.
Serverless inference for 100+ open-weight models. Fast cold starts, per-token pricing, fine-tuning available.
Fast inference on popular open-weight models. Speculative decoding + custom kernels keep latency low. Per-token billing.
Per-second billing serverless model API. Strong for image / video / audio models alongside LLMs.
Low-cost serverless inference for open-weight LLMs. OpenAI-compatible endpoints, no commitment.
China-based aggregator with strong coverage of Chinese open-weight models (GLM, Qwen, DeepSeek, MiniMax).
Anthropic's first-party API for the Claude family. Direct billing relationship; prompt caching + batch tier discount.
OpenAI's first-party API for GPT, o-series, embedding, and image models.
First-party API for the Gemini family. Free tier available for testing; production via paid quota.
Mistral's first-party API serving both their closed (Large) and open (Small / Codestral) families.
DeepSeek's first-party API for V3, R1, and the R1-Distill family. Among the lowest published per-token rates on frontier-tier models.
Moonshot AI's developer API for the Kimi family. Both open-weight (K2 series) and closed flagship (Moonshot v1) accessible from one endpoint.
Zhipu AI's first-party API for the GLM family. Open-weight GLM-4.5 + closed enterprise tiers.
Alibaba's first-party API for the Qwen family. Backend for Aliyun's AI services.
Tencent's first-party API for the Hunyuan family.
MiniMax's first-party API for the abab and MiniMax-Text families.
01.AI's first-party API for the Yi family — closed Yi-Lightning + open Yi-34B variants.
Cohere's first-party API — Command series for chat + Embed series for retrieval.
Elon Musk's AI lab — makers of the Grok family. First-party API with OpenAI-compatible endpoints; integrates tightly with X (Twitter) as the consumer surface.
Zhipu AI's international platform for the GLM family. Same models as Zhipu BigModel but with English docs, simpler signup, and a consumer chat surface at cha...
Ultra-low-latency inference on custom LPU silicon. Open-weight LLMs at >500 tokens/sec; OpenAI-compatible API.
Image and video model serverless API. Strong for Stable Diffusion / Flux variants with per-second billing.
Black Forest Labs' first-party API for the Flux image-generation family.
Stability AI's first-party API for Stable Diffusion + audio/video models.
Local-first runtime for open-weight LLMs — run them on your own machine or rented GPU. Library indexes Llama, Kimi, GLM, Qwen, …
Moonshot AI's full consumer platform — chat, file uploads, search, agents, code interpreter, image generation. Free tier with per-message limits + paid subsc...
OpenAI's flagship consumer platform — chat, projects, custom GPTs, voice, image generation, browsing, code interpreter. Free tier + paid Plus / Pro / Enterpr...
Anthropic's consumer platform — chat, projects, artifacts, computer use, agents, file uploads. Free tier + paid Pro / Max / Team / Enterprise tiers.
Quora's multi-model platform — chat across many proprietary and open-weight bots from a single subscription, plus a creator tool for building custom bots.
X.com Premium+ subscription gives access to Grok inside the X feed. Consumer chat surface, no API access — for that use xAI's first-party API instead.
AWS's managed model gateway — Claude, Llama, Mistral, Titan, Cohere behind IAM + VPC isolation.
GCP's managed model platform — Gemini, Claude, Llama under Google IAM + the rest of the GCP stack.
Microsoft Azure's enterprise gateway to OpenAI models under Azure IAM + compliance frameworks.
Hugging Face's managed inference for any model on the Hub. Auto-scales; backed by AWS/Azure/GCP.