AI models

Every way
to use the major models.

Closed models like Claude and GPT — link to the cheapest API provider. Open-weights like Llama, Kimi, DeepSeek — choose hosted inference or self-host on rented GPUs.

34 tracked · 0 open weights · 34 closed APIs · cheapest input $0.04/M
Quality × Price

Find the sweet spot.

Higher = stronger benchmark composite · further left = cheaper input

Loading...

34 models match — reset filters

Closed / API-only models.

Direct API, aggregator (OpenRouter, Bedrock), or chat UI.

Claude Opus 4.7

text
by Anthropic · Claude · 200,000 ctx

Frontier reasoning and long-form coding from Anthropic.

Claude Sonnet 4.6

text
by Anthropic · Claude · 200,000 ctx

Best price-performance from Anthropic. Default for production agents.

Claude 3.5 Sonnet

text
by Anthropic · Claude · 200,000 ctx

Anthropic's 3.5 generation — still in active production.

Claude Haiku 4.5

text
by Anthropic · Claude · 200,000 ctx

Fast, cheap Claude variant for high-throughput inference.

Claude 3.5 Haiku

text
by Anthropic · Claude · 200,000 ctx

Fast/cheap Claude 3.5 variant — production fallback for Haiku 4.5.

GPT-5

text
by OpenAI · GPT · 256,000 ctx

OpenAI's frontier multimodal reasoning model.

GPT-4 Turbo

text
by OpenAI · GPT · 128,000 ctx

OpenAI's pre-GPT-5 flagship — still extensively deployed.

AI21: Jamba Large 1.7

text
by Ai21 · 256,000 ctx

Jamba Large 1.7 is the latest model in the Jamba open family, offering improvements in grounding, instruction-following, and overall effi...

Amazon: Nova Micro 1.0

text
by Amazon · 128,000 ctx

Amazon Nova Micro 1.0 is a text-only model that delivers the lowest latency responses in the Amazon Nova family of models at a very low c...

Body Builder (beta)

text
by Openrouter · 128,000 ctx

Transform your natural language requests into structured OpenRouter API request objects. Describe what you want to accomplish with AI mod...

Cohere: Command A

text
by Cohere · 256,000 ctx

Command A is an open-weights 111B parameter model with a 256k context window focused on delivering great performance across agentic, mult...

Cohere: Command R (08-2024)

text
by Cohere · 128,000 ctx

command-r-08-2024 is an update of the [Command R](/models/cohere/command-r) with improved performance for multilingual retrieval-augmente...

Cohere: Command R+ (08-2024)

text
by Cohere · 128,000 ctx

command-r-plus-08-2024 is an update of the [Command R+](/models/cohere/command-r-plus) with roughly 50% higher throughput and 25% lower l...

Cohere: Command R7B (12-2024)

text
by Cohere · 128,000 ctx

Command R7B (12-2024) is a small, fast update of the Command R+ model, delivered in December 2024. It excels at RAG, tool use, agents, an...

Google: Gemma 2 27B

27B
by Google DeepMind · 8,192 ctx

Gemma 2 27B by Google is an open model built from the same research and technology used to create the [Gemini models](/models?q=gemini). ...

Google: Gemma 3n 4B

4B
by Google DeepMind · 32,768 ctx

Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It support...

Inflection: Inflection 3 Pi

text
by Inflection · 8,000 ctx

Inflection 3 Pi powers Inflection's [Pi](https://pi.ai) chatbot, including backstory, emotional intelligence, productivity, and safety. I...

Inflection: Inflection 3 Productivity

text
by Inflection · 8,000 ctx

Inflection 3 Productivity is optimized for following instructions. It is better for tasks requiring JSON output or precise adherence to p...

OpenAI: GPT-3.5 Turbo

text
by OpenAI · 16,385 ctx

GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and tradition...

OpenAI: GPT-3.5 Turbo 16k

text
by OpenAI · 16,385 ctx

This model offers four times the context length of gpt-3.5-turbo, allowing it to support approximately 20 pages of text in a single reque...

OpenAI: GPT-3.5 Turbo Instruct

text
by OpenAI · 4,095 ctx

This model is a variant of GPT-3.5 Turbo tuned for instructional prompts and omitting chat-related optimizations. Training data: up to Se...

OpenAI: GPT-3.5 Turbo (older v0613)

text
by OpenAI · 4,095 ctx

GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and tradition...

OpenAI: GPT-4

text
by OpenAI · 8,191 ctx

OpenAI's flagship model, GPT-4 is a large-scale multimodal language model capable of solving difficult problems with greater accuracy tha...

OpenAI: GPT-4 (older v0314)

text
by OpenAI · 8,191 ctx

GPT-4-0314 is the first version of GPT-4 released, with a context length of 8,192 tokens, and was supported until June 14. Training data:...

OpenAI: GPT-4o-mini Search Preview

text
by OpenAI · 128,000 ctx

GPT-4o mini Search Preview is a specialized model for web search in Chat Completions. It is trained to understand and execute web search ...

OpenAI: GPT-4o Search Preview

text
by OpenAI · 128,000 ctx

GPT-4o Search Previewis a specialized model for web search in Chat Completions. It is trained to understand and execute web search queries.

OpenAI: GPT-4 Turbo (older v1106)

text
by OpenAI · 128,000 ctx

The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Training data: up to ...

OpenAI: GPT-4 Turbo Preview

text
by OpenAI · 128,000 ctx

The preview GPT-4 model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Traini...

OpenAI: gpt-oss-safeguard-20b

20B
by OpenAI · 131,072 ctx

gpt-oss-safeguard-20b is a safety reasoning model from OpenAI built upon gpt-oss-20b. This open-weight, 21B-parameter Mixture-of-Experts ...

OpenAI: o3 Mini

text
by OpenAI · 200,000 ctx

OpenAI o3-mini is a cost-efficient language model optimized for STEM reasoning tasks, particularly excelling in science, mathematics, and...

OpenAI: o3 Mini High

text
by OpenAI · 200,000 ctx

OpenAI o3-mini-high is the same model as [o3-mini](/openai/o3-mini) with reasoning_effort set to high. o3-mini is a cost-efficient langua...

Owl Alpha

text
by Openrouter · 1,048,756 ctx

Owl Alpha is a high-performance foundation model designed for agentic workloads. Natively supports tool use, and long-context tasks, with...

Pareto Code Router

text
by Openrouter · 2,000,000 ctx

The Pareto Router maintains a tiered shortlist of strong coding models, ranked by [Artificial Analysis](https://artificialanalysis.ai/) c...

Perplexity: Sonar Deep Research

text
by Perplexity · 128,000 ctx

Sonar Deep Research is a research-focused model designed for multi-step retrieval, synthesis, and reasoning across complex topics. It aut...