GPU finden →

AI models

Every way
to use the major models.

Closed models like Claude and GPT — link to the cheapest API provider. Open-weights like Llama, Kimi, DeepSeek — choose hosted inference or self-host on rented GPUs.

378 tracked · 335 open weights · 43 closed APIs · cheapest input $0.01/M

Quality × Price

Find the sweet spot.

Higher = stronger benchmark composite · further left = cheaper input

Loading...

Modality All Text Multimodal Code Image Audio Video Vision Embedding

License All Open weights Closed / API

Size All ≤8B (edge) 8–30B (laptop) 30–100B (workstation) 100B+ (datacenter)

378 models match — reset filters

Open-weights models.

Run yourself on cheap GPUs, or use a hosted-inference provider.

Llama 3.3 70B

by Meta AI · Llama · 128,000 ctx

Meta's best-in-class open-weight LLM — 70B class.

Llama 3.1 70B

by Meta AI · Llama · 128,000 ctx

Llama 3.1 70B — production workhorse, superseded by 3.3 but still widely deployed.

Llama 3.1 8B

by Meta AI · Llama · 128,000 ctx

Meta's most popular open-weight small LLM — fits anywhere.

Llama 3.1 405B

by Meta AI · Llama · 128,000 ctx

Meta's largest open-weight LLM — dense 405B, frontier-class at launch.

Llama 3.2 3B

by Meta AI · Llama · 128,000 ctx

3B Llama — laptop-class chat + RAG.

Llama 3.2 1B

by Meta AI · Llama · 128,000 ctx

Meta's smallest Llama — mobile + on-device target.

DeepSeek V3

by DeepSeek · DeepSeek · 128,000 ctx

DeepSeek's flagship MoE — 671B total, 37B active, frontier-class.

DeepSeek R1

by DeepSeek · DeepSeek · 128,000 ctx

DeepSeek's reasoning model — RL-trained, frontier-class, MIT-licensed.

DeepSeek R1 Distill Llama 70B

by DeepSeek · DeepSeek · 128,000 ctx

70B Llama distilled from DeepSeek R1's reasoning traces.

DeepSeek R1 Distill Qwen 32B

by DeepSeek · DeepSeek · 128,000 ctx

32B Qwen base distilled from DeepSeek R1.

DeepSeek R1 Distill Qwen 14B

by DeepSeek · DeepSeek · 128,000 ctx

14B distilled R1 — laptop-friendly reasoning.

DeepSeek R1 Distill Qwen 7B

by DeepSeek · DeepSeek · 128,000 ctx

7B distilled R1 — runs on any modern GPU.

DeepSeek R1 Distill Qwen 1.5B

by DeepSeek · DeepSeek · 128,000 ctx

Tiny distilled R1 — phone / browser deployable.

Gemma 2 27B

by Google DeepMind · Gemma 2 · 8,192 ctx

Google's pre-Gemma-3 open-weight workhorse.

Gemma 2 9B

by Google DeepMind · Gemma 2 · 8,192 ctx

9B Gemma 2 — single-GPU local target.

Gemma 2 2B

by Google DeepMind · Gemma 2 · 8,192 ctx

Tiny 2B Gemma 2 — laptop / mobile.

Gemma 3 1B

by Google DeepMind · Gemma · 32,768 ctx

1B Gemma 3 — edge / mobile.

Qwen 2.5 72B

by Alibaba (Qwen Team) · Qwen · 128,000 ctx

Alibaba's flagship open-weight LLM — 72B dense.

Qwen 2.5 7B

by Alibaba (Qwen Team) · Qwen · 128,000 ctx

7B Qwen 2.5 — most popular Qwen variant on Ollama.

Qwen 2.5 14B

by Alibaba (Qwen Team) · Qwen · 128,000 ctx

14B Qwen 2.5 — sweet spot for single-GPU local hosting.

Qwen 2.5 32B

by Alibaba (Qwen Team) · Qwen · 128,000 ctx

32B Qwen 2.5 — laptop-class workhorse.

Qwen 2.5 3B

by Alibaba (Qwen Team) · Qwen · 32,768 ctx

3B Qwen 2.5 — laptop / edge target.

Mistral Large 2

by Mistral AI · Mistral Large · 128,000 ctx

Mistral's flagship open-weight model — 123B dense.

Mistral 7B v0.3

by Mistral AI · Mistral 7B · 32,768 ctx

The current Mistral 7B — adds function calling + extended vocab.

Mistral 7B v0.2

by Mistral AI · Mistral 7B · 32,768 ctx

Mistral 7B v0.2 — earlier 32K context revision.

Mistral 7B v0.1

by Mistral AI · Mistral 7B · 8,192 ctx

Original Mistral 7B — historical reference.

Mixtral 8x22B

by Mistral AI · Mixtral · 65,536 ctx

Mistral's open MoE — 141B total, 39B active.

Mistral Nemo 12B

by Mistral AI · Mistral · 128,000 ctx

Mistral × Nvidia collab — 12B Apache-licensed, multilingual.

Kimi K2

by Moonshot AI · Kimi · 256,000 ctx

Moonshot's frontier open-weight MoE — 1T total, 32B active.

Kimi K2 Thinking

by Moonshot AI · Kimi · 256,000 ctx

Moonshot's open-weight reasoning variant — extended chain-of-thought training on top of Kimi K2.

Kimi K2.6

by Moonshot AI · Kimi · 256,000 ctx

Long-horizon coding + autonomous-execution upgrade over K2.5.

GLM-4.5

by Zhipu AI · GLM · 128,000 ctx

Zhipu's frontier open-weight MoE — 355B total, 32B active. Strong agentic + reasoning marks for an open model.

GLM-4.5-Air

by Zhipu AI · GLM · 128,000 ctx

Smaller, cheaper sibling of GLM-4.5. 106B total, 12B active.

GLM-5.1

by Zhipu AI · GLM · 202,752 ctx

Zhipu's GLM 5.1 series — successor to GLM-5 on z.ai's API.

GLM-5

by Zhipu AI · GLM · 202,752 ctx

Zhipu's GLM 5 generation — closed flagship between GLM-4.7 and GLM-5.1.

GLM-5 Turbo

by Zhipu AI · GLM · 202,752 ctx

Faster, cheaper sibling of GLM-5 on z.ai.

Command R+

by Cohere · Command · 128,000 ctx

Cohere's open-weight RAG-optimized LLM — multilingual + tool use.

GLM-4.7

by Zhipu AI · GLM · 202,752 ctx

Mid-generation GLM 4.7 released between GLM-4.6 and GLM-5.

GLM-4.6

by Zhipu AI · GLM · 202,752 ctx

Incremental upgrade on GLM-4.5 — improved reasoning, same context window.

GLM-4.5-Flash

by Zhipu AI · GLM

Lowest-latency, lowest-cost variant of GLM-4.5 on z.ai.

Yi-Lightning

by 01.AI · Yi · 16,000 ctx

01.AI's fastest production model. Tops the LMSYS Arena Chinese leaderboard.

Yi-34B

by 01.AI · Yi · 32,000 ctx

Earlier open-weight Yi release — bilingual EN/ZH, 32K-token context.

Hunyuan-Large

by Tencent · Hunyuan · 256,000 ctx

Tencent's open-weight MoE — 389B total, 52B active. Largest open MoE at launch.

InternLM 2.5 20B

by Shanghai AI Lab · InternLM · 1,000,000 ctx

Shanghai AI Lab's dense 20B open-weight. Strong long-context + tool use for its size.

MiniMax-Text-01

by MiniMax · MiniMax-Text · 4,000,000 ctx

MiniMax's first open MoE — 456B total, 45.9B active. 1M+ context via Lightning Attention.

Baichuan2-13B

by Baichuan Inc. · Baichuan · 4,096 ctx

Bilingual EN/ZH open-weight. Strong for its size on Chinese-language benchmarks.

Qwen 3 235B

by Alibaba (Qwen Team) · Qwen 3 · 128,000 ctx

Alibaba's frontier MoE — 235B total / 22B active.

Qwen 3 32B

by Alibaba (Qwen Team) · Qwen 3 · 128,000 ctx

Dense 32B Qwen 3.

Qwen 3 14B

by Alibaba (Qwen Team) · Qwen 3 · 128,000 ctx

Qwen 3 8B

by Alibaba (Qwen Team) · Qwen 3 · 128,000 ctx

Qwen 3 4B

by Alibaba (Qwen Team) · Qwen 3 · 32,768 ctx

Phi-4

by Microsoft · Phi · 16,384 ctx

Microsoft's 14B small-LM workhorse — punches above its weight.

Phi-3.5 Mini

by Microsoft · Phi · 128,000 ctx

3.8B Phi — laptop / edge target.

Phi-3 Medium

by Microsoft · Phi · 128,000 ctx

Phi-3 Mini

by Microsoft · Phi · 128,000 ctx

GPT-OSS 120B

by OpenAI · GPT-OSS · 128,000 ctx

OpenAI's first open-weight LLM in years — Apache-licensed MoE.

GPT-OSS 20B

by OpenAI · GPT-OSS · 128,000 ctx

20B GPT-OSS — single-GPU local target.

TinyLlama 1.1B

by TinyLlama Project · TinyLlama · 2,048 ctx

1.1B Llama-arch model — 3T training tokens.

OLMo 3 7B

by Allen Institute for AI (AI2) · OLMo · 4,096 ctx

Allen AI's latest fully-open OLMo — model + training data + checkpoints.

OLMo 7B

by Allen Institute for AI (AI2) · OLMo · 2,048 ctx

Hermes 3 70B

by Nous Research · Hermes · 128,000 ctx

Nous Research's flagship Llama fine-tune — agent-friendly.

Hermes 3 8B

by Nous Research · Hermes · 128,000 ctx

IBM Granite 3.1 8B

by IBM Research · Granite · 128,000 ctx

IBM's enterprise-focused open-weight LLM.

AionLabs: Aion-1.0

by Aion Labs · 131,072 ctx

Aion-1.0 is a multi-model system designed for high performance across various tasks, including reasoning and coding. It is built on DeepS...

AionLabs: Aion-1.0-Mini

by Aion Labs · 131,072 ctx

Aion-1.0-Mini 32B parameter model is a distilled version of the DeepSeek-R1 model, designed for strong performance in reasoning domains s...

AionLabs: Aion-2.0

by Aion Labs · 131,072 ctx

Aion-2.0 is a variant of DeepSeek V3.2 optimized for immersive roleplaying and storytelling. It is particularly strong at introducing ten...

AionLabs: Aion-RP 1.0 (8B)

by Aion Labs · 32,768 ctx

Aion-RP-Llama-3.1-8B ranks the highest in the character evaluation portion of the RPBench-Auto benchmark, a roleplaying-specific variant ...

AlfredPros: CodeLLaMa 7B Instruct Solidity

by Alfredpros · 4,096 ctx

A finetuned 7 billion parameters Code LLaMA - Instruct model to generate Solidity smart contract using 4-bit QLoRA finetuning provided by...

AllenAI: Olmo 3 32B Think

by Allen Institute for AI (AI2) · 65,536 ctx

Olmo 3 32B Think is a large-scale, 32-billion-parameter model purpose-built for deep reasoning, complex logic chains and advanced instruc...

Arcee AI: Coder Large

by Arcee Ai · 32,768 ctx

Coder‑Large is a 32 B‑parameter offspring of Qwen 2.5‑Instruct that has been further trained on permissively‑licensed GitHub, CodeSearchN...

Arcee AI: Maestro Reasoning

by Arcee Ai · 131,072 ctx

Maestro Reasoning is Arcee's flagship analysis model: a 32 B‑parameter derivative of Qwen 2.5‑32 B tuned with DPO and chain‑of‑thought RL...

Arcee AI: Trinity Large Thinking

by Arcee Ai · 262,144 ctx

Trinity Large Thinking is a powerful open source reasoning model from the team at Arcee AI. It shows strong performance in PinchBench, ag...

Arcee AI: Trinity Mini

by Arcee Ai · 131,072 ctx

Trinity Mini is a 26B-parameter (3B active) sparse mixture-of-experts language model featuring 128 experts with 8 active per token. Engin...

Arcee AI: Virtuoso Large

by Arcee Ai · 131,072 ctx

Virtuoso‑Large is Arcee's top‑tier general‑purpose LLM at 72 B parameters, tuned to tackle cross‑domain reasoning, creative writing and e...

Arize AI Qwen 2 1.5B Instruct

by Togethercomputer · 32,768 ctx

Baidu: ERNIE 4.5 21B A3B

by Baidu · 131,072 ctx

A sophisticated text-based Mixture-of-Experts (MoE) model featuring 21B total parameters with 3B activated per token, delivering exceptio...

Baidu: ERNIE 4.5 21B A3B Thinking

by Baidu · 131,072 ctx

ERNIE-4.5-21B-A3B-Thinking is Baidu's upgraded lightweight MoE model, refined to boost reasoning depth and quality for top-tier performan...

Baidu: ERNIE 4.5 300B A47B

by Baidu · 131,072 ctx

ERNIE-4.5-300B-A47B is a 300B parameter Mixture-of-Experts (MoE) language model developed by Baidu as part of the ERNIE 4.5 series. It ac...

Cogito V1 Preview Llama 70B

by Deepcogito · 131,072 ctx

Cogito V1 Preview Llama 70B Turbo

by Deepcogito · 131,072 ctx

Cogito V1 Preview Llama 8B

by Deepcogito · 131,072 ctx

Cogito V1 Preview Qwen 14B

by Deepcogito · 131,072 ctx

Cogito V1 Preview Qwen 32B

by Deepcogito · 131,072 ctx

Deepcoder 14B Preview

by Togethercomputer · 131,072 ctx

Deep Cogito: Cogito v2.1 671B

by Deepcogito · 128,000 ctx

Cogito v2.1 671B MoE represents one of the strongest open models globally, matching performance of frontier closed and open models. This ...

Deepseek Coder 33B Instruct

by DeepSeek · 16,384 ctx

DeepSeek: DeepSeek V3

by DeepSeek · 163,840 ctx

DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous vers...

DeepSeek: DeepSeek V3 0324

by DeepSeek · 163,840 ctx

DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team...

DeepSeek: DeepSeek V3.1

by DeepSeek · 163,840 ctx

DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes via prom...

DeepSeek: DeepSeek V3.1 Terminus

by DeepSeek · 163,840 ctx

DeepSeek-V3.1 Terminus is an update to [DeepSeek V3.1](/deepseek/deepseek-chat-v3.1) that maintains the model's original capabilities whi...

DeepSeek: DeepSeek V3.2

by DeepSeek · 131,072 ctx

DeepSeek-V3.2 is a large language model designed to harmonize high computational efficiency with strong reasoning and agentic tool-use pe...

DeepSeek: DeepSeek V3.2 Exp

by DeepSeek · 163,840 ctx

DeepSeek-V3.2-Exp is an experimental large language model released by DeepSeek as an intermediate step between V3.1 and future architectu...

DeepSeek: DeepSeek V3.2 Speciale

by DeepSeek · 163,840 ctx

DeepSeek-V3.2-Speciale is a high-compute variant of DeepSeek-V3.2 optimized for maximum reasoning and agentic performance. It builds on D...

DeepSeek: DeepSeek V4 Flash

by DeepSeek · 1,048,576 ctx

DeepSeek V4 Flash is an efficiency-optimized Mixture-of-Experts model from DeepSeek with 284B total parameters and 13B activated paramete...

DeepSeek: DeepSeek V4 Pro

by DeepSeek · 1,048,576 ctx

DeepSeek V4 Pro is a large-scale Mixture-of-Experts model from DeepSeek with 1.6T total parameters and 49B activated parameters, supporti...

Deepseek OCR 2

by DeepSeek · 8,192 ctx

DeepSeek: R1 0528

by DeepSeek · 163,840 ctx

May 28th update to the [original DeepSeek R1](/deepseek/deepseek-r1) Performance on par with [OpenAI o1](/openai/o1), but open-sourced an...

DeepSeek-V3-0324

by DeepSeek · 163,840 ctx

DeepSeek-V3-0324, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token, an impro...

DeepSeek-V3.1

by DeepSeek · 163,840 ctx

DeepSeek-V3.1 is post-trained on the top of DeepSeek-V3.1-Base, which is built upon the original V3 base checkpoint through a two-phase l...

Devstral Small 2505

by Mistral AI · 131,072 ctx

EssentialAI Rnj-1 Instruct

by Essentialai · 32,768 ctx

EssentialAI: Rnj 1 Instruct

by Essentialai · 32,768 ctx

Rnj-1 is an 8B-parameter, dense, open-weight model family developed by Essential AI and trained from scratch with a focus on programming,...

Facebook CWM

by Meta AI · 131,072 ctx

Gemma 2 9B It

by Google DeepMind · 8,192 ctx

Gemma 2B It

by Google DeepMind · 8,192 ctx

Gemma 3 1b it

by Google DeepMind · 32,768 ctx

Gemma 3 1B Pt

by Google DeepMind · 32,768 ctx

Gemma 3 270M It

by Google DeepMind · 32,768 ctx

Gemma 3 270M It Lora

by Google DeepMind · 32,768 ctx

Gemma 3 27B It

by Google DeepMind · 65,536 ctx

Gemma 3 27B It Lora

by Google DeepMind

Gemma 3 27B Pt

by Google DeepMind

Gemma 3 4b it

by Google DeepMind · 65,536 ctx

Gemma 4 31B It Lora

by Google DeepMind · 262,144 ctx

gemma-4-31B-it-turbo

by Google DeepMind · 262,144 ctx

Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input and generating te...

Gemma 4 E2B-it

by Google DeepMind · 131,072 ctx

Gemma 4 E4B-it

by Google DeepMind · 131,072 ctx

Glm 4.5 Air Fp8

by Zhipu AI · 131,072 ctx

GLM 4.7 FP4

by Togethercomputer · 202,752 ctx

Glm 4.7 Fp8

by Zhipu AI · 202,752 ctx

GLM 5.1

by zai-org · 202,752 ctx

GLM-5.1 is Z.ai's next-generation flagship model built for agentic engineering, with stronger coding capabilities and sustained performan...

GLM 5.2

by zai-org · 1,048,576 ctx

GLM-5.2 introduces a robust 1M-token context and advanced, multi-effort coding capabilities to significantly enhance performance on long-...

GLM 5 Fp4

by Zhipu AI · 202,752 ctx

GLM OCR

by Zhipu AI · 131,072 ctx

gpt-oss-120b-Turbo

by OpenAI · 131,072 ctx

Hermes-3-Llama-3.1-70B

by Nous Research · 131,072 ctx

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better rolepl...

Holo3 35B A3b

by Hcompany · 262,144 ctx

IBM: Granite 4.0 Micro

by IBM Research · 131,000 ctx

Granite-4.0-H-Micro is a 3B parameter from the Granite 4 family of models. These models are the latest in a series of models released by ...

IBM: Granite 4.1 8B

by IBM Research · 131,072 ctx

Granite 4.1 8B is a dense, decoder-only 8-billion-parameter language model from IBM, part of the Granite 4.1 family. It supports a 131K-t...

Inception: Mercury 2

by Inception · 128,000 ctx

Mercury 2 is an extremely fast reasoning LLM, and the first reasoning diffusion LLM (dLLM). Instead of generating tokens sequentially, Me...

inclusionAI: Ling-2.6-1T

by Inclusionai · 262,144 ctx

Ling-2.6-1T is an instant (instruct) model from inclusionAI and the company’s trillion-parameter flagship, designed for real-world agents...

inclusionAI: Ling-2.6-flash

by Inclusionai · 262,144 ctx

Ling-2.6-flash is an instant (instruct) model from inclusionAI with 104B total parameters and 7.4B active parameters, designed for real-w...

inclusionAI: Ring-2.6-1T

by Inclusionai · 262,144 ctx

Ring-2.6-1T is a 1T-parameter-scale thinking model with 63B active parameters, built for real-world agent workflows that require both str...

Kimi K2.5 FP4

by Togethercomputer · 262,144 ctx

Kwaipilot: KAT-Coder-Pro V2

by Kwaipilot · 256,000 ctx

KAT-Coder-Pro V2 is the latest high-performance model in KwaiKAT’s KAT-Coder series, designed for complex enterprise-grade software engin...

L3.1-70B-Euryale-v2.2

by Sao10k · 131,072 ctx

Euryale 3.1 - 70B v2.2 is a model focused on creative roleplay from Sao10k

L3-8B-Lunaris-v1-Turbo

by Sao10k · 8,192 ctx

LFM2-24B-A2B

by Togethercomputer · 32,768 ctx

LiquidAI: LFM2-24B-A2B

by Liquid · 128,000 ctx

LFM2-24B-A2B is the largest model in the LFM2 family of hybrid architectures designed for efficient on-device deployment. Built as a 24B ...

Llama 3.1 70B

by Meta AI · 131,072 ctx

Llama 3.1 Nemotron 70B Instruct HF

by Nvidia · 32,768 ctx

Llama 3.3 70B Instruct FP8 Lora

by Meta AI · 131,072 ctx

Llama 4 Maverick 17B 128E Instruct Nvfp4

by Meta AI · 1,048,576 ctx

Llama 4 Maverick Instruct (17Bx128E) FP8

by Meta AI · 1,048,576 ctx

Llama 4 Scout 17B 16E Instruct Fp8 Lora

by Meta AI · 10,485,760 ctx

Llama 4 Scout (17Bx16E)

by Meta AI · 262,144 ctx

Llama 4 Scout Instruct (17Bx16E)

by Meta AI · 1,048,576 ctx

Llama Guard 3 8B

by Meta AI · 131,072 ctx

Llama Guard 3 is a Llama-3.1-8B pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be u...

Magistral Small 2506

by Mistral AI · 40,960 ctx

Magnum v4 72B

by Anthracite Org · 32,768 ctx

This is a series of models designed to replicate the prose quality of the Claude 3 models, specifically Sonnet(https://openrouter.ai/anth...

Mancer: Weaver (alpha)

by Mancer · 8,000 ctx

An attempt to recreate Claude-style verbosity, but don't expect the same level of coherence or memory. Meant for use in roleplay/narrativ...

Medgemma 27B Text It

by Google DeepMind · 131,072 ctx

Meta Llama 3.1 405B Instruct

by Meta AI · 4,096 ctx

Meta-Llama-3.1-70B-Instruct

by Meta AI · 131,072 ctx

Meta developed and released the Meta Llama 3.1 family of large language models (LLMs), a collection of pretrained and instruction tuned g...

Meta Llama 3.1 70B Instruct Turbo

by Meta AI · 131,072 ctx

Meta Llama 3.1 8B

by Meta AI · 16,384 ctx

Meta-Llama-3.1-8B-Instruct

by Meta AI · 131,072 ctx

Meta developed and released the Meta Llama 3.1 family of large language models (LLMs), a collection of pretrained and instruction tuned g...

Meta Llama 3.1 8B Instruct Awq Int4

by Meta AI · 131,072 ctx

Meta Llama 3.1 8B Instruct Turbo

by Meta AI · 131,072 ctx

Meta Llama 3.2 1B Instruct

by Meta AI · 131,072 ctx

Meta Llama 3.2 3B Instruct

by Meta AI · 131,072 ctx

Meta Llama 3.3 70B Instruct Turbo

by Meta AI · 131,072 ctx

Meta: Llama 3 70B Instruct

by Meta AI · 8,192 ctx

Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 70B instruct-tuned version was optimized for high...

Meta Llama 3 70B Instruct Turbo

by Meta AI · 8,192 ctx

Meta Llama 3 8B Instruct

by Meta AI · 8,192 ctx

Meta: Llama 3 8B Instruct

by Meta AI · 8,192 ctx

Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high ...

Meta Llama 3 8B Instruct Lite

by Meta AI · 8,192 ctx

Meta Llama 3 8B Instruct Reference

by Meta AI · 8,192 ctx

meta-llama/Llama-2-7b-chat-hf

by Meta AI · 4,096 ctx

meta-llama/Llama-3.3-70B-Instruct

by Meta AI · 131,072 ctx

Microsoft: Phi 4 Mini Instruct

by Microsoft · 131,072 ctx

Phi-4-mini-instruct is a lightweight open model built upon synthetic data and filtered publicly available websites - with a focus on high...

Minimax M1 40K

by MiniMax · 1,048,576 ctx

Minimax M1 80K

by MiniMax · 1,048,576 ctx

MiniMax-M2.5

by MiniMax · 196,608 ctx

MiniMax M2.5 is built for state-of-the-art coding, agentic tool use, search, and office work, extensively trained with reinforcement lear...

MiniMax M2.5 FP4

by MiniMax · 8,192 ctx

MiniMax M2.7

by MiniMax · 196,608 ctx

Mixture-of-Experts language model. M2.7 is capable of building complex agent harnesses and completing highly elaborate productivity tasks...

MiniMax-M2.7-Turbo

by Minimaxai · 196,608 ctx

Speed-optimized MiniMax-M2.7

MiniMax: MiniMax M1

by MiniMax · 1,000,000 ctx

MiniMax-M1 is a large-scale, open-weight reasoning model designed for extended context and high-efficiency inference. It leverages a hybr...

MiniMax: MiniMax M2

by MiniMax · 204,800 ctx

MiniMax-M2 is a compact, high-efficiency large language model optimized for end-to-end coding and agentic workflows. With 10 billion acti...

MiniMax: MiniMax M2.1

by MiniMax · 204,800 ctx

MiniMax-M2.1 is a lightweight, state-of-the-art large language model optimized for coding, agentic workflows, and modern application deve...

MiniMax: MiniMax M2.5

by MiniMax · 204,800 ctx

MiniMax-M2.5 is a SOTA large language model designed for real-world productivity. Trained in a diverse range of complex real-world digita...

MiniMax: MiniMax M2.7

by MiniMax · 204,800 ctx

MiniMax-M2.7 is a next-generation large language model designed for autonomous, real-world productivity and continuous improvement. Built...

MiniMax: MiniMax M2-her

by MiniMax · 65,536 ctx

MiniMax M2-her is a dialogue-first large language model built for immersive roleplay, character-driven chat, and expressive multi-turn co...

Ministral 3 14B Instruct 2512

by Mistral AI · 262,144 ctx

Mistral (7B) Instruct v0.3

by Mistral AI · 32,768 ctx

Mistral: Codestral 2508

by Mistral AI · 256,000 ctx

Mistral's cutting-edge language model for coding released end of July 2025. Codestral specializes in low-latency, high-frequency tasks su...

Mistral: Devstral 2 2512

by Mistral AI · 262,144 ctx

Devstral 2 is a state-of-the-art open-source model by Mistral AI specializing in agentic coding. It is a 123B-parameter dense transformer...

Mistral: Devstral Medium

by Mistral AI · 131,072 ctx

Devstral Medium is a high-performance code generation and agentic reasoning model developed jointly by Mistral AI and All Hands AI. Posit...

Mistral: Devstral Small 1.1

by Mistral AI · 131,072 ctx

Devstral Small 1.1 is a 24B parameter open-weight language model for software engineering agents, developed by Mistral AI in collaboratio...

Mistral Large

by Mistral AI · 128,000 ctx

This is Mistral AI's flagship model, Mistral Large 2 (version `mistral-large-2407`). It's a proprietary weights-available model and excel...

Mistral Large 2407

by Mistral AI · 131,072 ctx

This is Mistral AI's flagship model, Mistral Large 2 (version mistral-large-2407). It's a proprietary weights-available model and excels ...

Mistral: Mistral 7B Instruct v0.1

by Mistral AI · 4,096 ctx

A 7.3B parameter model that outperforms Llama 2 13B on all benchmarks, with optimizations for speed and context length.

Mistral: Mistral Small 3

by Mistral AI · 32,768 ctx

Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache...

Mistral-Nemo-Instruct-2407

by Mistral AI · 131,072 ctx

12B model trained jointly by Mistral AI and NVIDIA, it significantly outperforms existing models smaller or similar in size.

Mistral: Saba

by Mistral AI · 32,768 ctx

Mistral Saba is a 24B-parameter language model specifically designed for the Middle East and South Asia, delivering accurate and contextu...

Mixtral 8X22b Instruct V0.1

by Mistral AI · 65,536 ctx

Mixtral-8x7B Instruct v0.1

by Mistral AI · 32,768 ctx

Mixtral 8x7B Instruct V0.1 FP8 Lora

by Mistral AI · 32,768 ctx

Mixtral 8X7b V0.1

by Mistral AI · 32,768 ctx

Molmo 7B D 0924

by Allen Institute for AI (AI2) · 4,096 ctx

MoonshotAI: Kimi K2 0905

by Moonshot AI · 262,144 ctx

Kimi K2 0905 is the September update of [Kimi K2 0711](moonshotai/kimi-k2). It is a large-scale Mixture-of-Experts (MoE) language model d...

Morph: Morph V3 Fast

by Morph · 81,920 ctx

Morph's fastest apply model for code edits. ~10,500 tokens/sec with 96% accuracy for rapid code transformations. The model requires the p...

Morph: Morph V3 Large

by Morph · 262,144 ctx

Morph's high-accuracy apply model for complex code edits. ~4,500 tokens/sec with 98% accuracy for precise code transformations. The model...

MythoMax 13B

by Gryphe · 4,096 ctx

One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge

Nemotron 3 Nano Omni 30B A3b Reasoning Fp8

by Nvidia · 131,072 ctx

Nex AGI: DeepSeek V3.1 Nex N1

by Nex Agi · 131,072 ctx

DeepSeek V3.1 Nex-N1 is the flagship release of the Nex-N1 series — a post-trained model designed to highlight agent autonomy, tool use, ...

nim/meta/llama-3.1-70b-instruct

by Meta AI · 16,384 ctx

nim/meta/llama-3.1-8b-instruct

by Meta AI · 16,384 ctx

nim/meta/llama-3.2-11b-vision-instruct

by Nvidia · 16,384 ctx

nim/meta/llama-3.2-90b-vision-instruct

by Meta AI · 16,384 ctx

nim/meta/llama-3.3-70b-instruct

by Meta AI · 16,384 ctx

nim/mistralai/mixtral-8x22b-instruct-v01

by Mistral AI · 16,384 ctx

nim/mistralai/mixtral-8x7b-instruct-v01

by Mistral AI · 16,384 ctx

nim/nvidia/llama-3.1-nemotron-70b-instruct

by Nvidia · 16,384 ctx

nim/nvidia/llama-3.3-nemotron-super-49b-v1

by Nvidia · 16,384 ctx

nim/nv-mistralai/mistral-nemo-12b-instruct

by Nvidia · 16,384 ctx

Nous Hermes 2 Mixtral 8X7B Dpo

by Nous Research · 32,768 ctx

Nous: Hermes 3 405B Instruct

by Nous Research · 131,072 ctx

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better rolepl...

Nous: Hermes 4 405B

by Nous Research · 131,072 ctx

Hermes 4 is a large-scale reasoning model built on Meta-Llama-3.1-405B and released by Nous Research. It introduces a hybrid reasoning mo...

Nous: Hermes 4 70B

by Nous Research · 131,072 ctx

Hermes 4 70B is a hybrid reasoning model from Nous Research, built on Meta-Llama-3.1-70B. It introduces the same hybrid mode as the large...

NousResearch: Hermes 2 Pro - Llama-3 8B

by Nous Research · 8,192 ctx

Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Datas...

NVIDIA: Llama 3.3 Nemotron Super 49B V1.5

by Nvidia · 131,072 ctx

Llama-3.3-Nemotron-Super-49B-v1.5 is a 49B-parameter, English-centric reasoning/chat model derived from Meta’s Llama-3.3-70B-Instruct wit...

NVIDIA: Nemotron 3 Nano 30B A3B

by Nvidia · 262,144 ctx

NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build special...

Nvidia Nemotron 3 Nano 30B A3b Bf16

by Nvidia · 262,144 ctx

NVIDIA: Nemotron 3 Super

by Nvidia · 1,000,000 ctx

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accu...

NVIDIA-Nemotron-3-Super-120B-A12B

by Nvidia · 262,144 ctx

NVIDIA Nemotron 3 Super is a hybrid Mixture-of-Experts (MoE) model engineered for highest compute efficiency and accuracy in multi-agent ...

Nvidia Nemotron 3 Super 120B A12b Bf16

by Nvidia · 262,144 ctx

Nvidia Nemotron 3 Super 120B A12b Fp8

by Nvidia · 262,144 ctx

NVIDIA Nemotron 3 Ultra NVFP4

by Nvidia · 262,144 ctx

Nemotron-3-Ultra-550B-A55B-NVFP4 is a frontier-scale large language model (LLM) trained by NVIDIA, designed to deliver strong agentic, re...

Nvidia Nemotron Nano 9B V2

by Nvidia · 131,072 ctx

NVIDIA: Nemotron Nano 9B V2

by Nvidia · 131,072 ctx

NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reaso...

Prime Intellect: INTELLECT-3

by Prime Intellect · 131,072 ctx

INTELLECT-3 is a 106B-parameter Mixture-of-Experts model (12B active) post-trained from GLM-4.5-Air-Base using supervised fine-tuning (SF...

Qwen 2 (1.5B)

by Alibaba (Qwen Team) · 32,768 ctx

Qwen2.5 14B

by Alibaba (Qwen Team) · 131,072 ctx

Qwen 2.5 14B Instruct

by Alibaba (Qwen Team) · 32,768 ctx

Qwen2.5 1.5B

by Alibaba (Qwen Team) · 131,072 ctx

Qwen2.5 1.5B Instruct

by Alibaba (Qwen Team) · 32,768 ctx

Qwen2.5 32B

by Alibaba (Qwen Team) · 131,072 ctx

Qwen2.5 32B Instruct

by Alibaba (Qwen Team) · 32,768 ctx

Qwen2.5 3B Instruct

by Alibaba (Qwen Team) · 32,768 ctx

Qwen2.5 72B

by Alibaba (Qwen Team) · 131,072 ctx

Qwen2.5 72B Instruct

by Alibaba (Qwen Team) · 32,768 ctx

Qwen2.5 72B Instruct Turbo

by Alibaba (Qwen Team) · 131,072 ctx

Qwen2.5 7B

by Alibaba (Qwen Team) · 131,072 ctx

Qwen2.5 7B Instruct

by Alibaba (Qwen Team) · 32,768 ctx

Qwen2.5 7B Instruct Turbo

by Alibaba (Qwen Team) · 32,768 ctx

Qwen 2.5 Coder 32B Instruct

by Alibaba (Qwen Team) · 16,384 ctx

Qwen 2 (72B)

by Alibaba (Qwen Team) · 32,768 ctx

Qwen2 72B Instruct

by Togethercomputer · 32,768 ctx

Qwen 2 (7B)

by Alibaba (Qwen Team) · 32,768 ctx

Qwen 2 Instruct (1.5B)

by Alibaba (Qwen Team) · 32,768 ctx

Qwen2-VL (72B) Instruct

by Alibaba (Qwen Team) · 32,768 ctx

Qwen3 0.6B

by Alibaba (Qwen Team) · 40,960 ctx

Qwen3 0.6B Base

by Alibaba (Qwen Team) · 32,768 ctx

Qwen3 14B Base

by Alibaba (Qwen Team) · 32,768 ctx

Qwen3 1.7B

by Alibaba (Qwen Team) · 40,960 ctx

Qwen3 1.7B Base

by Alibaba (Qwen Team) · 32,768 ctx

Qwen3-235B-A22B-Instruct-2507

by Alibaba (Qwen Team) · 262,144 ctx

Qwen3-235B-A22B-Instruct-2507 is the updated version of the Qwen3-235B-A22B non-thinking mode, featuring Significant improvements in gene...

Qwen3 235B A22B Instruct 2507 FP8 Throughput

by Alibaba (Qwen Team) · 262,144 ctx

Qwen3 30B A3b Base

by Alibaba (Qwen Team) · 32,768 ctx

Qwen3 30B A3B Instruct 2507 Lora

by Alibaba (Qwen Team) · 262,144 ctx

Qwen3 4B Base

by Alibaba (Qwen Team) · 32,768 ctx

Qwen3 4B Instruct 2507

by Alibaba (Qwen Team) · 262,144 ctx

Qwen3.5 0.8B Lora

by Alibaba (Qwen Team) · 262,144 ctx

Qwen3.5 122B A10b Fp8

by Alibaba (Qwen Team) · 262,144 ctx

Qwen3.5 122B A10B Lora

by Alibaba (Qwen Team) · 8,192 ctx

Qwen3.5 27B Lora

by Alibaba (Qwen Team) · 262,144 ctx

Qwen3.5 2B Lora

by Alibaba (Qwen Team) · 262,144 ctx

Qwen3.5 35B A3B Base Lora

by Alibaba (Qwen Team) · 262,144 ctx

Qwen3.5 35B A3b LoRa

by Alibaba (Qwen Team) · 262,144 ctx

Qwen3.5 397B A17B Lora

by Alibaba (Qwen Team) · 262,144 ctx

Qwen3.5 4B Lora

by Alibaba (Qwen Team) · 262,144 ctx

Qwen3.5 9B Fp8

by Alibaba (Qwen Team) · 262,144 ctx

Qwen3.5 9B Lora

by Alibaba (Qwen Team) · 262,144 ctx

Qwen3.6 27B Lora

by Alibaba (Qwen Team) · 262,144 ctx

Qwen3.6 35B A3b Fp8

by Alibaba (Qwen Team) · 262,144 ctx

Qwen3.6 35B A3B Lora

by Alibaba (Qwen Team) · 262,144 ctx

Qwen3 8B Base

by Alibaba (Qwen Team) · 32,768 ctx

Qwen3 8B Lora

by Alibaba (Qwen Team) · 40,960 ctx

Qwen3 Coder 480B A35B Instruct Fp8

by Alibaba (Qwen Team) · 262,144 ctx

Qwen3-Coder-480B-A35B-Instruct-Turbo

by Alibaba (Qwen Team) · 262,144 ctx

Qwen3-Coder-480B-A35B-Instruct is the Qwen3's most agentic code model, featuring Significant Performance on Agentic Coding, Agentic Brows...

Qwen3 Coder Next Fp8

by Alibaba (Qwen Team) · 262,144 ctx

Qwen3 Next 80B A3b Instruct Fp8

by Alibaba (Qwen Team)

Qwen3-VL-235B-A22B-Instruct-FP8

by Alibaba (Qwen Team) · 262,144 ctx

Qwen: Qwen2.5 7B Instruct

by Alibaba (Qwen Team) · 131,072 ctx

Qwen2.5 7B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more...

Qwen: Qwen3 14B

by Alibaba (Qwen Team) · 131,702 ctx

Qwen3-14B is a dense 14.8B parameter causal language model from the Qwen3 series, designed for both complex reasoning and efficient dialo...

Qwen: Qwen3 235B A22B

by Alibaba (Qwen Team) · 131,072 ctx

Qwen3-235B-A22B is a 235B parameter mixture-of-experts (MoE) model developed by Qwen, activating 22B parameters per forward pass. It supp...

Qwen: Qwen3 235B A22B Instruct 2507

by Alibaba (Qwen Team) · 262,144 ctx

Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language model based on the Qwen3-235B architecture...

Qwen: Qwen3 235B A22B Thinking 2507

by Alibaba (Qwen Team) · 262,144 ctx

Qwen3-235B-A22B-Thinking-2507 is a high-performance, open-weight Mixture-of-Experts (MoE) language model optimized for complex reasoning ...

Qwen: Qwen3 30B A3B

by Alibaba (Qwen Team) · 131,072 ctx

Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to e...

Qwen: Qwen3 30B A3B Instruct 2507

by Alibaba (Qwen Team) · 262,144 ctx

Qwen3-30B-A3B-Instruct-2507 is a 30.5B-parameter mixture-of-experts language model from Qwen, with 3.3B active parameters per inference. ...

Qwen: Qwen3 30B A3B Thinking 2507

by Alibaba (Qwen Team) · 131,072 ctx

Qwen3-30B-A3B-Thinking-2507 is a 30B parameter Mixture-of-Experts reasoning model optimized for complex tasks requiring extended multi-st...

Qwen: Qwen3 32B

by Alibaba (Qwen Team) · 131,072 ctx

Qwen3-32B is a dense 32.8B parameter causal language model from the Qwen3 series, optimized for both complex reasoning and efficient dial...

Qwen: Qwen3.6 Max Preview

by Alibaba (Qwen Team) · 262,144 ctx

Qwen3.6-Max-Preview is a proprietary frontier model from Alibaba Cloud built on a sparse mixture-of-experts architecture with approximate...

Qwen: Qwen3.7 Max

by Alibaba (Qwen Team) · 1,000,000 ctx

Qwen3.7-Max is the flagship model in Alibaba's Qwen3.7 series. It supports text input and output and is designed for agent-centric worklo...

Qwen: Qwen3 8B

by Alibaba (Qwen Team) · 131,072 ctx

Qwen3-8B is a dense 8.2B parameter causal language model from the Qwen3 series, designed for both reasoning-heavy tasks and efficient dia...

Qwen: Qwen3 Coder 30B A3B Instruct

by Alibaba (Qwen Team) · 160,000 ctx

Qwen3-Coder-30B-A3B-Instruct is a 30.5B parameter Mixture-of-Experts (MoE) model with 128 experts (8 active per forward pass), designed f...

Qwen: Qwen3 Coder 480B A35B

by Alibaba (Qwen Team) · 1,048,576 ctx

Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agenti...

Qwen: Qwen3 Coder Flash

by Alibaba (Qwen Team) · 1,000,000 ctx

Qwen3 Coder Flash is Alibaba's fast and cost efficient version of their proprietary Qwen3 Coder Plus. It is a powerful coding agent model...

Qwen: Qwen3 Coder Next

by Alibaba (Qwen Team) · 262,144 ctx

Qwen3-Coder-Next is an open-weight causal language model optimized for coding agents and local development workflows. It uses a sparse Mo...

Qwen: Qwen3 Coder Plus

by Alibaba (Qwen Team) · 1,000,000 ctx

Qwen3 Coder Plus is Alibaba's proprietary version of the Open Source Qwen3 Coder 480B A35B. It is a powerful coding agent model specializ...

Qwen: Qwen3 Max

by Alibaba (Qwen Team) · 262,144 ctx

Qwen3-Max is an updated release built on the Qwen3 series, offering major improvements in reasoning, instruction following, multilingual ...

Qwen: Qwen3 Max Thinking

by Alibaba (Qwen Team) · 262,144 ctx

Qwen3-Max-Thinking is the flagship reasoning model in the Qwen3 series, designed for high-stakes cognitive tasks that require deep, multi...

Qwen: Qwen3 Next 80B A3B Instruct

by Alibaba (Qwen Team) · 262,144 ctx

Qwen3-Next-80B-A3B-Instruct is an instruction-tuned chat model in the Qwen3-Next series optimized for fast, stable responses without “thi...

Qwen: Qwen3 Next 80B A3B Thinking

by Alibaba (Qwen Team) · 262,144 ctx

Qwen3-Next-80B-A3B-Thinking is a reasoning-first chat model in the Qwen3-Next line that outputs structured “thinking” traces by default. ...

Qwen: Qwen-Plus

by Alibaba (Qwen Team) · 1,000,000 ctx

Qwen-Plus, based on the Qwen2.5 foundation model, is a 131K context model with a balanced performance, speed, and cost combination.

Qwen: Qwen Plus 0728

by Alibaba (Qwen Team) · 1,000,000 ctx

Qwen Plus 0728, based on the Qwen3 foundation model, is a 1 million context hybrid reasoning model with a balanced performance, speed, an...

Qwen QwQ-32B

by Alibaba (Qwen Team) · 131,072 ctx

Reka Flash 3

by Rekaai · 65,536 ctx

Reka Flash 3 is a general-purpose, instruction-tuned large language model with 21 billion parameters, developed by Reka. It excels at gen...

Relace: Relace Apply 3

by Relace · 256,000 ctx

Relace Apply 3 is a specialized code-patching LLM that merges AI-suggested edits straight into your source files. It can apply updates fr...

Relace: Relace Search

by Relace · 256,000 ctx

The relace-search model uses 4-12 `view_file` and `grep` tools in parallel to explore a codebase and return relevant files to the user re...

ReMM SLERP 13B

by Undi95 · 6,144 ctx

A recreation trial of the original MythoMax-L2-B13 but with updated models. #merge

Sao10K: Llama 3.1 70B Hanami x1

by Sao10k · 16,000 ctx

This is [Sao10K](/sao10k)'s experiment over [Euryale v2.2](/sao10k/l3.1-euryale-70b).

Sao10K: Llama 3.1 Euryale 70B v2.2

by Sao10k · 131,072 ctx

Euryale L3.1 70B v2.2 is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). It is the successor of [Euryale L3...

Sao10K: Llama 3.3 Euryale 70B

by Sao10k · 131,072 ctx

Euryale L3.3 70B is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). It is the successor of [Euryale L3 70B ...

Sao10K: Llama 3 8B Lunaris

by Sao10k · 8,192 ctx

Lunaris 8B is a versatile generalist and roleplaying model based on Llama 3. It's a strategic merge of multiple models, designed to balan...

Sao10k: Llama 3 Euryale 70B v2.1

by Sao10k · 8,192 ctx

Euryale 70B v2.1 is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). - Better prompt adherence. - Better ana...

Sarvam M

by Sarvamai · 32,768 ctx

StepFun: Step 3.5 Flash

by Stepfun · 262,144 ctx

Step 3.5 Flash is StepFun's most capable open-source foundation model. Built on a sparse Mixture of Experts (MoE) architecture, it select...

Switchpoint Router

by Switchpoint · 131,072 ctx

Switchpoint AI's router instantly analyzes your request and directs it to the optimal AI from an ever-evolving library. As the world of L...

Tencent: Hunyuan A13B Instruct

by Tencent · 131,072 ctx

Hunyuan-A13B is a 13B active parameter Mixture-of-Experts (MoE) language model developed by Tencent, with a total parameter count of 80B ...

Tencent: Hy3 preview

by Tencent · 262,144 ctx

Hy3 preview is a high-efficiency Mixture-of-Experts model from Tencent designed for agentic workflows and production use. It supports con...

TheDrummer: Cydonia 24B V4.1

by Thedrummer · 131,072 ctx

Uncensored and creative writing model based on Mistral Small 3.2 24B with good recall, prompt adherence, and intelligence.

TheDrummer: Rocinante 12B

by Thedrummer · 32,768 ctx

Rocinante 12B is designed for engaging storytelling and rich prose. Early testers have reported: - Expanded vocabulary with unique and ex...

TheDrummer: Skyfall 36B V2

by Thedrummer · 32,768 ctx

Skyfall 36B v2 is an enhanced iteration of Mistral Small 2501, specifically fine-tuned for improved creativity, nuanced writing, role-pla...

TheDrummer: UnslopNemo 12B

by Thedrummer · 32,768 ctx

UnslopNemo v4.1 is the latest addition from the creator of Rocinante, designed for adventure writing and role-play scenarios.

Tongyi DeepResearch 30B A3B

by Alibaba (Qwen Team) · 131,072 ctx

Tongyi DeepResearch is an agentic large language model developed by Tongyi Lab, with 30 billion total parameters activating only 3 billio...

Upstage: Solar Pro 3

by Upstage · 128,000 ctx

Solar Pro 3 is Upstage's powerful Mixture-of-Experts (MoE) language model. With 102B total parameters and 12B active parameters per forwa...

WizardLM-2 8x22B

by Microsoft · 65,536 ctx

WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to leading proprie...

Writer: Palmyra X5

by Writer · 1,040,000 ctx

Palmyra X5 is Writer's most advanced model, purpose-built for building and scaling AI agents across the enterprise. It delivers industry-...

Xiaomi: MiMo-V2.5-Pro

by Xiaomi · 1,048,576 ctx

MiMo-V2.5-Pro is Xiaomi’s flagship model, delivering strong performance in general agentic capabilities, complex software engineering, an...

Xiaomi: MiMo-V2-Flash

by Xiaomi · 262,144 ctx

MiMo-V2-Flash is an open-source foundation language model developed by Xiaomi. It is a Mixture-of-Experts model with 309B total parameter...

Xiaomi: MiMo-V2-Pro

by Xiaomi · 1,048,576 ctx

MiMo-V2-Pro is Xiaomi's flagship foundation model, featuring over 1T total parameters and a 1M context length, deeply optimized for agent...

Z.ai: GLM 4 32B

by Zhipu AI · 128,000 ctx

GLM 4 32B is a cost-effective foundation language model. It can efficiently perform complex tasks and has significantly enhanced capabili...

Z.ai: GLM 4.7 Flash

by Zhipu AI · 202,752 ctx

As a 30B-class SOTA model, GLM-4.7-Flash offers a new option that balances performance and efficiency. It is further optimized for agenti...

Closed / API-only models.

Direct API, aggregator (OpenRouter, Bedrock), or chat UI.

Claude Opus 4.7

by Anthropic · Claude · 200,000 ctx

Frontier reasoning and long-form coding from Anthropic.

Claude Sonnet 4.6

by Anthropic · Claude · 200,000 ctx

Best price-performance from Anthropic. Default for production agents.

Claude 3.5 Sonnet

by Anthropic · Claude · 200,000 ctx

Anthropic's 3.5 generation — still in active production.

Claude Haiku 4.5

by Anthropic · Claude · 200,000 ctx

Fast, cheap Claude variant for high-throughput inference.

Claude 3.5 Haiku

by Anthropic · Claude · 200,000 ctx

Fast/cheap Claude 3.5 variant — production fallback for Haiku 4.5.

GPT-5

by OpenAI · GPT · 256,000 ctx

OpenAI's frontier multimodal reasoning model.

GPT-4 Turbo

by OpenAI · GPT · 128,000 ctx

OpenAI's pre-GPT-5 flagship — still extensively deployed.

AI21: Jamba Large 1.7

by Ai21 · 256,000 ctx

Jamba Large 1.7 is the latest model in the Jamba open family, offering improvements in grounding, instruction-following, and overall effi...

AionLabs: Aion-3.0

by Aion Labs · 131,072 ctx

Aion-3.0 is a multi-model roleplaying and storytelling system from AionLabs, built on the GLM family of models. It uses a collaborative g...

AionLabs: Aion-3.0-Mini

by Aion Labs · 131,072 ctx

Aion-3.0 Mini is a multi-model roleplaying and storytelling system from AionLabs, built on the DeepSeek family of models. It uses a colla...

Amazon: Nova Micro 1.0

by Amazon · 128,000 ctx

Amazon Nova Micro 1.0 is a text-only model that delivers the lowest latency responses in the Amazon Nova family of models at a very low c...

Body Builder (beta)

by Openrouter · 128,000 ctx

Transform your natural language requests into structured OpenRouter API request objects. Describe what you want to accomplish with AI mod...

Cohere: Command A

by Cohere · 256,000 ctx

Command A is an open-weights 111B parameter model with a 256k context window focused on delivering great performance across agentic, mult...

Cohere: Command R (08-2024)

by Cohere · 128,000 ctx

command-r-08-2024 is an update of the [Command R](/models/cohere/command-r) with improved performance for multilingual retrieval-augmente...

Cohere: Command R+ (08-2024)

by Cohere · 128,000 ctx

command-r-plus-08-2024 is an update of the [Command R+](/models/cohere/command-r-plus) with roughly 50% higher throughput and 25% lower l...

Cohere: Command R7B (12-2024)

by Cohere · 128,000 ctx

Command R7B (12-2024) is a small, fast update of the Command R+ model, delivered in December 2024. It excels at RAG, tool use, agents, an...

Google: Gemma 2 27B

by Google DeepMind · 8,192 ctx

Gemma 2 27B by Google is an open model built from the same research and technology used to create the [Gemini models](/models?q=gemini). ...

Google: Gemma 3n 4B

by Google DeepMind · 32,768 ctx

Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It support...

Inflection: Inflection 3 Pi

by Inflection · 8,000 ctx

Inflection 3 Pi powers Inflection's [Pi](https://pi.ai) chatbot, including backstory, emotional intelligence, productivity, and safety. I...

Inflection: Inflection 3 Productivity

by Inflection · 8,000 ctx

Inflection 3 Productivity is optimized for following instructions. It is better for tasks requiring JSON output or precise adherence to p...

NVIDIA: Nemotron 3 Ultra

by Nvidia · 1,000,000 ctx

NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters out of 550B total (...

OpenAI: GPT-3.5 Turbo

by OpenAI · 16,385 ctx

GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and tradition...

OpenAI: GPT-3.5 Turbo 16k

by OpenAI · 16,385 ctx

This model offers four times the context length of gpt-3.5-turbo, allowing it to support approximately 20 pages of text in a single reque...

OpenAI: GPT-3.5 Turbo Instruct

by OpenAI · 4,095 ctx

This model is a variant of GPT-3.5 Turbo tuned for instructional prompts and omitting chat-related optimizations. Training data: up to Se...

OpenAI: GPT-3.5 Turbo (older v0613)

by OpenAI · 4,095 ctx

GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and tradition...

OpenAI: GPT-4

by OpenAI · 8,191 ctx

OpenAI's flagship model, GPT-4 is a large-scale multimodal language model capable of solving difficult problems with greater accuracy tha...

OpenAI: GPT-4 (older v0314)

by OpenAI · 8,191 ctx

GPT-4-0314 is the first version of GPT-4 released, with a context length of 8,192 tokens, and was supported until June 14. Training data:...

OpenAI: GPT-4o-mini Search Preview

by OpenAI · 128,000 ctx

GPT-4o mini Search Preview is a specialized model for web search in Chat Completions. It is trained to understand and execute web search ...

OpenAI: GPT-4o Search Preview

by OpenAI · 128,000 ctx

GPT-4o Search Previewis a specialized model for web search in Chat Completions. It is trained to understand and execute web search queries.

OpenAI: GPT-4 Turbo (older v1106)

by OpenAI · 128,000 ctx

The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Training data: up to ...

OpenAI: GPT-4 Turbo Preview

by OpenAI · 128,000 ctx

The preview GPT-4 model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Traini...

OpenAI: gpt-oss-safeguard-20b

by OpenAI · 131,072 ctx

gpt-oss-safeguard-20b is a safety reasoning model from OpenAI built upon gpt-oss-20b. This open-weight, 21B-parameter Mixture-of-Experts ...

OpenAI: o3 Mini

by OpenAI · 200,000 ctx

OpenAI o3-mini is a cost-efficient language model optimized for STEM reasoning tasks, particularly excelling in science, mathematics, and...

OpenAI: o3 Mini High

by OpenAI · 200,000 ctx

OpenAI o3-mini-high is the same model as [o3-mini](/openai/o3-mini) with reasoning_effort set to high. o3-mini is a cost-efficient langua...

OpenRouter: Fusion

by Openrouter · 128,000 ctx

Fusion turns your prompt into a small multi-model deliberation. A panel of expert models (see below) analyzes your prompt in parallel wit...

Owl Alpha

by Openrouter · 1,048,756 ctx

Owl Alpha is a high-performance foundation model designed for agentic workloads. Natively supports tool use, and long-context tasks, with...

Pareto Code Router

by Openrouter · 2,000,000 ctx

The Pareto Router maintains a tiered shortlist of strong coding models, ranked by [Artificial Analysis](https://artificialanalysis.ai/) c...

Perplexity: Sonar Deep Research

by Perplexity · 128,000 ctx

Sonar Deep Research is a research-focused model designed for multi-step retrieval, synthesis, and reasoning across complex topics. It aut...

Poolside: Laguna M.1

by Poolside · 262,144 ctx

Laguna M.1 is the flagship coding agent model from [Poolside](https://poolside.ai/), optimized for complex software engineering tasks. De...

Poolside: Laguna XS.2

by Poolside · 262,144 ctx

Laguna XS.2 is the second-generation model in the XS size class from [Poolside](https://poolside.ai/), their efficient coding agent serie...

Poolside: Laguna XS 2.1

by Poolside · 262,144 ctx

Laguna XS 2.1 is the latest coding agent model in the 33B-A3B category from [Poolside](https://poolside.ai/) and a step forward from thei...

Tencent: Hy3

by Tencent · 262,144 ctx

Hy3 is a 295B-parameter Mixture-of-Experts model from Tencent (21B active, 192 experts with top-8 routing) built for reasoning, agentic w...

Z.ai: GLM 5.2

by Zhipu AI · 1,048,576 ctx

GLM-5.2 is Z.ai’s flagship model for the era of long-horizon tasks. With a truly usable 1M-token context window, it can handle project-le...