Every way to use the major models.

Frontier reasoning and long-form coding from Anthropic.

Claude Sonnet 4.6

Best price-performance from Anthropic. Default for production agents.

Claude 3.5 Sonnet

Anthropic's 3.5 generation — still in active production.

Claude Haiku 4.5

Fast, cheap Claude variant for high-throughput inference.

Claude 3.5 Haiku

Fast/cheap Claude 3.5 variant — production fallback for Haiku 4.5.

GPT-5

by OpenAI · GPT · 256,000 ctx

OpenAI's frontier multimodal reasoning model.

GPT-4 Turbo

by OpenAI · GPT · 128,000 ctx

OpenAI's pre-GPT-5 flagship — still extensively deployed.

AI21: Jamba Large 1.7

by Ai21 · 256,000 ctx

Jamba Large 1.7 is the latest model in the Jamba open family, offering improvements in grounding, instruction-following, and overall effi...

AionLabs: Aion-3.0

by Aion Labs · 131,072 ctx

Aion-3.0 is a multi-model roleplaying and storytelling system from AionLabs, built on the GLM family of models. It uses a collaborative g...

AionLabs: Aion-3.0-Mini

by Aion Labs · 131,072 ctx

Aion-3.0 Mini is a multi-model roleplaying and storytelling system from AionLabs, built on the DeepSeek family of models. It uses a colla...

Amazon: Nova Micro 1.0

by Amazon · 128,000 ctx

Amazon Nova Micro 1.0 is a text-only model that delivers the lowest latency responses in the Amazon Nova family of models at a very low c...

Body Builder (beta)

by Openrouter · 128,000 ctx

Transform your natural language requests into structured OpenRouter API request objects. Describe what you want to accomplish with AI mod...

Cohere: Command A

by Cohere · 256,000 ctx

Command A is an open-weights 111B parameter model with a 256k context window focused on delivering great performance across agentic, mult...

Cohere: Command R (08-2024)

by Cohere · 128,000 ctx

command-r-08-2024 is an update of the [Command R](/models/cohere/command-r) with improved performance for multilingual retrieval-augmente...

Cohere: Command R+ (08-2024)

by Cohere · 128,000 ctx

command-r-plus-08-2024 is an update of the [Command R+](/models/cohere/command-r-plus) with roughly 50% higher throughput and 25% lower l...

Cohere: Command R7B (12-2024)

by Google DeepMind · 8,192 ctx

by Cohere · 128,000 ctx

Command R7B (12-2024) is a small, fast update of the Command R+ model, delivered in December 2024. It excels at RAG, tool use, agents, an...

Google: Gemma 2 27B

27B

Gemma 2 27B by Google is an open model built from the same research and technology used to create the [Gemini models](/models?q=gemini). ...

Google: Gemma 3n 4B

by Google DeepMind · 32,768 ctx

Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It support...

Inflection: Inflection 3 Pi

by Inflection · 8,000 ctx

Inflection 3 Pi powers Inflection's [Pi](https://pi.ai) chatbot, including backstory, emotional intelligence, productivity, and safety. I...

Inflection: Inflection 3 Productivity

by Inflection · 8,000 ctx

Inflection 3 Productivity is optimized for following instructions. It is better for tasks requiring JSON output or precise adherence to p...

NVIDIA: Nemotron 3 Ultra

by Nvidia · 1,000,000 ctx

NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters out of 550B total (...

OpenAI: GPT-3.5 Turbo

by OpenAI · 16,385 ctx

GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and tradition...

OpenAI: GPT-3.5 Turbo 16k

by OpenAI · 16,385 ctx

This model offers four times the context length of gpt-3.5-turbo, allowing it to support approximately 20 pages of text in a single reque...

OpenAI: GPT-3.5 Turbo Instruct

by OpenAI · 4,095 ctx

This model is a variant of GPT-3.5 Turbo tuned for instructional prompts and omitting chat-related optimizations. Training data: up to Se...

OpenAI: GPT-3.5 Turbo (older v0613)

by OpenAI · 4,095 ctx

GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and tradition...

OpenAI: GPT-4

by OpenAI · 8,191 ctx

OpenAI's flagship model, GPT-4 is a large-scale multimodal language model capable of solving difficult problems with greater accuracy tha...

OpenAI: GPT-4 (older v0314)

by OpenAI · 8,191 ctx

GPT-4-0314 is the first version of GPT-4 released, with a context length of 8,192 tokens, and was supported until June 14. Training data:...

OpenAI: GPT-4o-mini Search Preview

GPT-4o mini Search Preview is a specialized model for web search in Chat Completions. It is trained to understand and execute web search ...

OpenAI: GPT-4o Search Preview

GPT-4o Search Previewis a specialized model for web search in Chat Completions. It is trained to understand and execute web search queries.

OpenAI: GPT-4 Turbo (older v1106)

The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Training data: up to ...

OpenAI: GPT-4 Turbo Preview

The preview GPT-4 model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Traini...

OpenAI: gpt-oss-safeguard-20b

20B

by OpenAI · 131,072 ctx

gpt-oss-safeguard-20b is a safety reasoning model from OpenAI built upon gpt-oss-20b. This open-weight, 21B-parameter Mixture-of-Experts ...

OpenAI: o3 Mini

by OpenAI · 200,000 ctx

OpenAI o3-mini is a cost-efficient language model optimized for STEM reasoning tasks, particularly excelling in science, mathematics, and...

OpenAI: o3 Mini High

by OpenAI · 200,000 ctx

OpenAI o3-mini-high is the same model as [o3-mini](/openai/o3-mini) with reasoning_effort set to high. o3-mini is a cost-efficient langua...

OpenRouter: Fusion

by Openrouter · 128,000 ctx

Fusion turns your prompt into a small multi-model deliberation. A panel of expert models (see below) analyzes your prompt in parallel wit...

Owl Alpha

by Openrouter · 1,048,756 ctx

Owl Alpha is a high-performance foundation model designed for agentic workloads. Natively supports tool use, and long-context tasks, with...

Pareto Code Router

by Openrouter · 2,000,000 ctx

The Pareto Router maintains a tiered shortlist of strong coding models, ranked by [Artificial Analysis](https://artificialanalysis.ai/) c...

Perplexity: Sonar Deep Research

by Perplexity · 128,000 ctx

Sonar Deep Research is a research-focused model designed for multi-step retrieval, synthesis, and reasoning across complex topics. It aut...

Poolside: Laguna M.1

by Poolside · 262,144 ctx

Laguna M.1 is the flagship coding agent model from [Poolside](https://poolside.ai/), optimized for complex software engineering tasks. De...

Poolside: Laguna XS.2

by Poolside · 262,144 ctx

Laguna XS.2 is the second-generation model in the XS size class from [Poolside](https://poolside.ai/), their efficient coding agent serie...

Poolside: Laguna XS 2.1

by Poolside · 262,144 ctx

Laguna XS 2.1 is the latest coding agent model in the 33B-A3B category from [Poolside](https://poolside.ai/) and a step forward from thei...

Tencent: Hy3

by Tencent · 262,144 ctx

Hy3 is a 295B-parameter Mixture-of-Experts model from Tencent (21B active, 192 experts with top-8 routing) built for reasoning, agentic w...

Z.ai: GLM 5.2