Every way to use the major models.

Meta's best-in-class open-weight LLM — 70B class.

Llama 3.1 70B

Llama 3.1 70B — production workhorse, superseded by 3.3 but still widely deployed.

Llama 3.1 8B

Meta's most popular open-weight small LLM — fits anywhere.

Llama 3.1 405B

405B

Meta's largest open-weight LLM — dense 405B, frontier-class at launch.

Llama 3.2 90B Vision

90B

Meta's largest vision-capable Llama.

Llama 3.2 11B Vision

11B

Meta's open-weight multimodal LLM — vision + text in 11B.

Llama 3.2 3B

3B Llama — laptop-class chat + RAG.

Llama 3.2 1B

Meta's smallest Llama — mobile + on-device target.

DeepSeek V3

671B

DeepSeek's flagship MoE — 671B total, 37B active, frontier-class.

DeepSeek R1

671B

DeepSeek's reasoning model — RL-trained, frontier-class, MIT-licensed.

DeepSeek R1 Distill Llama 70B

70B Llama distilled from DeepSeek R1's reasoning traces.

DeepSeek R1 Distill Qwen 32B

32B Qwen base distilled from DeepSeek R1.

DeepSeek R1 Distill Qwen 14B

15B

14B distilled R1 — laptop-friendly reasoning.

DeepSeek R1 Distill Qwen 7B

7B distilled R1 — runs on any modern GPU.

DeepSeek R1 Distill Qwen 1.5B

by Google DeepMind · Gemma · 128,000 ctx

Tiny distilled R1 — phone / browser deployable.

Gemma 3 27B

27B

Google's open-weight multimodal LLM — efficient and license-permissive.

Gemma 2 27B

27B

by Google DeepMind · Gemma 2 · 8,192 ctx

Google's pre-Gemma-3 open-weight workhorse.

Gemma 2 9B

by Google DeepMind · Gemma 2 · 8,192 ctx

9B Gemma 2 — single-GPU local target.

Gemma 2 2B

by Google DeepMind · Gemma 2 · 8,192 ctx

Tiny 2B Gemma 2 — laptop / mobile.

Gemma 3 12B

by Google DeepMind · Gemma · 128,000 ctx

12B Gemma 3 — multimodal, single-GPU target.

Gemma 3 4B

by Google DeepMind · Gemma · 128,000 ctx

4B Gemma 3 — laptop multimodal.

Gemma 3 1B

by Google DeepMind · Gemma · 32,768 ctx

1B Gemma 3 — edge / mobile.

Qwen 2.5 72B

73B

Alibaba's flagship open-weight LLM — 72B dense.

Qwen 2.5 7B

7B Qwen 2.5 — most popular Qwen variant on Ollama.

Qwen 2.5 14B

15B

14B Qwen 2.5 — sweet spot for single-GPU local hosting.

Qwen 2.5 32B

by Alibaba (Qwen Team) · Qwen · 32,768 ctx

32B Qwen 2.5 — laptop-class workhorse.

Qwen 2.5 3B

3B Qwen 2.5 — laptop / edge target.

Qwen 2.5 Coder 32B

by Mistral AI · Mistral Large · 128,000 ctx

Alibaba's open-weight coding model — best in class for 32B.

Mistral Large 2

123B

Mistral's flagship open-weight model — 123B dense.

Mistral 7B v0.3

by Mistral AI · Mistral 7B · 32,768 ctx

The current Mistral 7B — adds function calling + extended vocab.

Mistral 7B v0.2

by Mistral AI · Mistral 7B · 32,768 ctx

Mistral 7B v0.2 — earlier 32K context revision.

Mistral 7B v0.1

by Mistral AI · Mistral 7B · 8,192 ctx

Original Mistral 7B — historical reference.

Mixtral 8x22B

141B

by Mistral AI · Mixtral · 65,536 ctx

Mistral's open MoE — 141B total, 39B active.

Mistral Nemo 12B

by Mistral AI · Mistral · 128,000 ctx

Mistral × Nvidia collab — 12B Apache-licensed, multilingual.

Kimi K2

Moonshot's frontier open-weight MoE — 1T total, 32B active.

Kimi K2 Thinking

Moonshot's open-weight reasoning variant — extended chain-of-thought training on top of Kimi K2.

Kimi K2.5

Multimodal agentic variant — adds a vision encoder to the K2 backbone.

Kimi K2.6

by Zhipu AI · GLM · 128,000 ctx

Long-horizon coding + autonomous-execution upgrade over K2.5.

GLM-4.5

355B

Zhipu's frontier open-weight MoE — 355B total, 32B active. Strong agentic + reasoning marks for an open model.

GLM-4.5-Air

106B

by Zhipu AI · GLM · 128,000 ctx

Smaller, cheaper sibling of GLM-4.5. 106B total, 12B active.

GLM-5.1

32B

Zhipu's GLM 5.1 series — successor to GLM-5 on z.ai's API.

GLM-5

355B

Zhipu's GLM 5 generation — closed flagship between GLM-4.7 and GLM-5.1.

GLM-5 Turbo

by Cohere · Command · 128,000 ctx

Faster, cheaper sibling of GLM-5 on z.ai.

Command R+

104B

Cohere's open-weight RAG-optimized LLM — multilingual + tool use.

GLM-4.7

Mid-generation GLM 4.7 released between GLM-4.6 and GLM-5.

GLM-4.6

355B

Incremental upgrade on GLM-4.5 — improved reasoning, same context window.

GLM-4.5-Flash

by Zhipu AI · GLM

Lowest-latency, lowest-cost variant of GLM-4.5 on z.ai.

Yi-Lightning

by 01.AI · Yi · 16,000 ctx

01.AI's fastest production model. Tops the LMSYS Arena Chinese leaderboard.

Yi-34B

34B

by 01.AI · Yi · 32,000 ctx

Earlier open-weight Yi release — bilingual EN/ZH, 32K-token context.

Whisper Large v3

OpenAI's open-weight speech-to-text — the standard transcription model.

Whisper Medium

769M Whisper variant — half the size of Large, 80% of the accuracy.

Whisper Small

244M Whisper — fits on edge GPUs and CPU.

Whisper Base

74M Whisper — browser / Raspberry Pi-deployable.

Whisper Tiny

by Tencent · Hunyuan · 256,000 ctx

39M Whisper — runs in-browser via WebGPU.

Hunyuan-Large

389B

Tencent's open-weight MoE — 389B total, 52B active. Largest open MoE at launch.

FLUX.1 Pro

by Black Forest Labs · FLUX

Black Forest Labs' flagship image-gen model — closed/API.

FLUX.1 Dev

by Black Forest Labs · FLUX

Open-weight FLUX.1 — non-commercial license.

FLUX.1 Schnell

by Black Forest Labs · FLUX

Distilled fast FLUX.1 — Apache-2.0, commercial-friendly.

InternLM 2.5 20B

20B

by Shanghai AI Lab · InternLM · 1,000,000 ctx

Shanghai AI Lab's dense 20B open-weight. Strong long-context + tool use for its size.

Stable Diffusion 3.5 Large

Stability AI's latest open-weight image-gen — 8.1B params, MMDiT architecture.

Stable Diffusion 3.5 Medium

2.5B SD 3.5 — fits on 12 GB consumer GPUs.

Stable Diffusion XL

Workhorse open-weight image-gen — 3.5B params, runs anywhere.

Stable Diffusion 1.5

by MiniMax · MiniMax-Text · 4,000,000 ctx

The original viral image-gen model — still searched heavily.

MiniMax-Text-01

456B

MiniMax's first open MoE — 456B total, 45.9B active. 1M+ context via Lightning Attention.

Baichuan2-13B

13B

by Baichuan Inc. · Baichuan · 4,096 ctx

Bilingual EN/ZH open-weight. Strong for its size on Chinese-language benchmarks.

Qwen 3 235B

235B

by Alibaba (Qwen Team) · Qwen 3 · 128,000 ctx

Alibaba's frontier MoE — 235B total / 22B active.

Qwen 3 32B

by Alibaba (Qwen Team) · Qwen 3 · 128,000 ctx

Dense 32B Qwen 3.

Open embedding model — 69M Ollama pulls, the local default.

mxbai-embed-large

by Mixedbread AI · Mixedbread Embed · 512 ctx

335M embedding model — top MTEB scores for its size.

BGE-M3

by BAAI (Beijing Academy of AI) · BGE · 8,192 ctx

Multilingual + multifunctional embedding (100+ languages).

Phi-4

15B

by Microsoft · Phi · 16,384 ctx

Microsoft's 14B small-LM workhorse — punches above its weight.

Phi-3.5 Mini

by Microsoft · Phi · 128,000 ctx

3.8B Phi — laptop / edge target.

Phi-3 Medium

14B

by Microsoft · Phi · 128,000 ctx

Phi-3 Mini

by Microsoft · Phi · 128,000 ctx

LLaVA 34B

34B

by LLaVA Project · LLaVA · 4,096 ctx

Largest open-weight LLaVA — vision encoder + Yi-34B backbone.

LLaVA 13B

13B

by LLaVA Project · LLaVA · 4,096 ctx

LLaVA 7B

by LLaVA Project · LLaVA · 4,096 ctx

Code Llama 70B

Meta's largest code-specialised Llama.

Code Llama 34B

34B

Code Llama 13B

13B

Code Llama 7B

by DeepSeek · DeepSeek Coder · 128,000 ctx

DeepSeek Coder V2 236B

236B

DeepSeek's MoE coding model — 236B total, 21B active.

DeepSeek Coder V2 Lite

16B

by DeepSeek · DeepSeek Coder · 128,000 ctx

16B MoE / 2.4B active — laptop-class coder.

DeepSeek Coder 33B

by DeepSeek · DeepSeek Coder · 16,384 ctx

DeepSeek Coder 6.7B

by DeepSeek · DeepSeek Coder · 16,384 ctx

GPT-OSS 120B

120B

by OpenAI · GPT-OSS · 128,000 ctx

OpenAI's first open-weight LLM in years — Apache-licensed MoE.

GPT-OSS 20B

20B

by OpenAI · GPT-OSS · 128,000 ctx

20B GPT-OSS — single-GPU local target.

TinyLlama 1.1B

by TinyLlama Project · TinyLlama · 2,048 ctx

1.1B Llama-arch model — 3T training tokens.

Moondream 1.8B

by Moondream · Moondream · 2,048 ctx

Tiny multimodal — laptop-class image understanding.

OLMo 3 7B

by Allen Institute for AI (AI2) · OLMo · 4,096 ctx

Allen AI's latest fully-open OLMo — model + training data + checkpoints.

OLMo 7B

by Allen Institute for AI (AI2) · OLMo · 2,048 ctx

Hermes 3 70B

by Nous Research · Hermes · 128,000 ctx

Nous Research's flagship Llama fine-tune — agent-friendly.

Hermes 3 8B

by Nous Research · Hermes · 128,000 ctx

IBM Granite 3.1 8B

by IBM Research · Granite · 128,000 ctx

IBM's enterprise-focused open-weight LLM.

IBM Granite Code 8B

by IBM Research · Granite · 4,096 ctx

AionLabs: Aion-1.0

by Aion Labs · 131,072 ctx

Aion-1.0 is a multi-model system designed for high performance across various tasks, including reasoning and coding. It is built on DeepS...

AionLabs: Aion-1.0-Mini

by Aion Labs · 131,072 ctx

Aion-1.0-Mini 32B parameter model is a distilled version of the DeepSeek-R1 model, designed for strong performance in reasoning domains s...

AionLabs: Aion-2.0

by Aion Labs · 131,072 ctx

Aion-2.0 is a variant of DeepSeek V3.2 optimized for immersive roleplaying and storytelling. It is particularly strong at introducing ten...

AionLabs: Aion-RP 1.0 (8B)

by Aion Labs · 32,768 ctx

Aion-RP-Llama-3.1-8B ranks the highest in the character evaluation portion of the RPBench-Auto benchmark, a roleplaying-specific variant ...

AlfredPros: CodeLLaMa 7B Instruct Solidity

by Alfredpros · 4,096 ctx

A finetuned 7 billion parameters Code LLaMA - Instruct model to generate Solidity smart contract using 4-bit QLoRA finetuning provided by...

AllenAI: Olmo 3 32B Think

32B

by Allen Institute for AI (AI2) · 65,536 ctx

Olmo 3 32B Think is a large-scale, 32-billion-parameter model purpose-built for deep reasoning, complex logic chains and advanced instruc...

Arcee AI: Coder Large

by Arcee Ai · 32,768 ctx

Coder‑Large is a 32 B‑parameter offspring of Qwen 2.5‑Instruct that has been further trained on permissively‑licensed GitHub, CodeSearchN...

Arcee AI: Maestro Reasoning

Maestro Reasoning is Arcee's flagship analysis model: a 32 B‑parameter derivative of Qwen 2.5‑32 B tuned with DPO and chain‑of‑thought RL...

Arcee AI: Spotlight

Spotlight is a 7‑billion‑parameter vision‑language model derived from Qwen 2.5‑VL and fine‑tuned by Arcee AI for tight image‑text groundi...

Arcee AI: Trinity Large Thinking

by Arcee Ai · 262,144 ctx

Trinity Large Thinking is a powerful open source reasoning model from the team at Arcee AI. It shows strong performance in PinchBench, ag...

Arcee AI: Trinity Mini

Trinity Mini is a 26B-parameter (3B active) sparse mixture-of-experts language model featuring 128 experts with 8 active per token. Engin...

Arcee AI: Virtuoso Large

by Togethercomputer · 32,768 ctx

Virtuoso‑Large is Arcee's top‑tier general‑purpose LLM at 72 B parameters, tuned to tackle cross‑domain reasoning, creative writing and e...

Arize AI Qwen 2 1.5B Instruct

Baidu: ERNIE 4.5 21B A3B

21B

A sophisticated text-based Mixture-of-Experts (MoE) model featuring 21B total parameters with 3B activated per token, delivering exceptio...

Baidu: ERNIE 4.5 21B A3B Thinking

21B

ERNIE-4.5-21B-A3B-Thinking is Baidu's upgraded lightweight MoE model, refined to boost reasoning depth and quality for top-tier performan...

Baidu: ERNIE 4.5 300B A47B

300B

ERNIE-4.5-300B-A47B is a 300B parameter Mixture-of-Experts (MoE) language model developed by Baidu as part of the ERNIE 4.5 series. It ac...

Baidu: ERNIE 4.5 VL 28B A3B

28B

A powerful multimodal Mixture-of-Experts chat model featuring 28B total parameters with 3B activated per token, delivering exceptional te...

Baidu: ERNIE 4.5 VL 424B A47B

424B