AI models

Every way
to use the major models.

Closed models like Claude and GPT — link to the cheapest API provider. Open-weights like Llama, Kimi, DeepSeek — choose hosted inference or self-host on rented GPUs.

68 tracked · 68 open weights · 0 closed APIs · cheapest input $0.01/M

68 models match — reset filters

Open-weights models.

Run yourself on cheap GPUs, or use a hosted-inference provider.

Gemma 3 27B

27B
by Google DeepMind · Gemma · 128,000 ctx

Google's open-weight multimodal LLM — efficient and license-permissive.

Gemma 3 12B

12B
by Google DeepMind · Gemma · 128,000 ctx

12B Gemma 3 — multimodal, single-GPU target.

Gemma 3 4B

4B
by Google DeepMind · Gemma · 128,000 ctx

4B Gemma 3 — laptop multimodal.

Kimi K2.5

1000B
by Moonshot AI · Kimi · 256,000 ctx

Multimodal agentic variant — adds a vision encoder to the K2 backbone.

Arcee AI: Spotlight

1B
by Arcee Ai · 131,072 ctx

Spotlight is a 7‑billion‑parameter vision‑language model derived from Qwen 2.5‑VL and fine‑tuned by Arcee AI for tight image‑text groundi...

Baidu: ERNIE 4.5 VL 28B A3B

28B
by Baidu · 131,072 ctx

A powerful multimodal Mixture-of-Experts chat model featuring 28B total parameters with 3B activated per token, delivering exceptional te...

Baidu: ERNIE 4.5 VL 424B A47B

424B
by Baidu · 131,072 ctx

ERNIE-4.5-VL-424B-A47B is a multimodal Mixture-of-Experts (MoE) model from Baidu’s ERNIE 4.5 series, featuring 424B total parameters with...

Baidu: Qianfan-OCR-Fast

multimodal
by Baidu · 65,536 ctx

Qianfan-OCR-Fast is a domain-specific multimodal large model purpose-built for OCR. By leveraging specialized OCR training data while pre...

ByteDance Seed: Seed 1.6

200B
by Bytedance Seed · 262,144 ctx

Seed 1.6 is a general-purpose model released by the ByteDance Seed team. It incorporates multimodal capabilities and adaptive deep thinki...

ByteDance Seed: Seed 1.6 Flash

multimodal
by Bytedance Seed · 262,144 ctx

Seed 1.6 Flash is an ultra-fast multimodal deep thinking model by ByteDance Seed, supporting both text and visual understanding. It featu...

ByteDance Seed: Seed-2.0-Lite

32B
by Bytedance Seed · 262,144 ctx

Seed-2.0-Lite is a versatile, cost‑efficient enterprise workhorse that delivers strong multimodal and agent capabilities while offering n...

ByteDance Seed: Seed-2.0-Mini

multimodal
by Bytedance Seed · 262,144 ctx

Seed-2.0-mini targets latency-sensitive, high-concurrency, and cost-sensitive scenarios, emphasizing fast response and flexible inference...

ByteDance: UI-TARS 7B

7B
by Bytedance · 128,000 ctx

UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobil...

gemini-3.1-pro

multimodal
by Google DeepMind · 1,000,000 ctx

Bring any idea to life with state-of-the-art reasoning to help you learn, build, and plan anything. Best for complex tasks and bringing c...

Kimi K2.5

multimodal
by Moonshot AI · 262,144 ctx

Kimi K2.5 is Moonshot AI's flagship agentic model and a new SOTA open model. It unifies vision and text, thinking and non-thinking modes,...

Kimi K2.6

multimodal
by Moonshot AI · 262,144 ctx

Kimi K2.6 is an open-source, native multimodal agentic model that advances practical capabilities in long-horizon coding, coding-driven d...

Llama-3.2-11B-Vision-Instruct

11B
by Meta AI · 131,072 ctx

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It exc...

Meta: Llama 4 Maverick

multimodal
by Meta AI · 1,048,576 ctx

Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, built on a mixture-of-experts (MoE) architec...

Meta: Llama 4 Scout

multimodal
by Meta AI · 10,000,000 ctx

Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by Meta, activating 17 billion parameters out of ...

Meta: Llama Guard 4 12B

12B
by Meta AI · 163,840 ctx

Llama Guard 4 is a Llama 4 Scout-derived multimodal pretrained model, fine-tuned for content safety classification. Similar to previous v...

MiniMax: MiniMax-01

multimodal
by MiniMax · 1,000,192 ctx

MiniMax-01 is a combines MiniMax-Text-01 for text generation and MiniMax-VL-01 for image understanding. It has 456 billion parameters, wi...

Mistral: Ministral 3 14B 2512

14B
by Mistral AI · 262,144 ctx

The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities and performance comparable to its larger Mistra...

Mistral: Ministral 3 3B 2512

3B
by Mistral AI · 131,072 ctx

The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision capabilities.

Mistral: Ministral 3 8B 2512

8B
by Mistral AI · 262,144 ctx

A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision capabilities.

Mistral: Mistral Large 3 2512

multimodal
by Mistral AI · 262,144 ctx

Mistral Large 3 2512 is Mistral’s most capable model to date, featuring a sparse mixture-of-experts architecture with 41B active paramete...

Mistral: Mistral Medium 3

multimodal
by Mistral AI · 131,072 ctx

Mistral Medium 3 is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly r...

Mistral: Mistral Medium 3.1

multimodal
by Mistral AI · 131,072 ctx

Mistral Medium 3.1 is an updated version of Mistral Medium 3, which is a high-performance enterprise-grade language model designed to del...

Mistral: Mistral Medium 3.5

multimodal
by Mistral AI · 262,144 ctx

Mistral Medium 3.5 is a dense 128B instruction-following model from Mistral AI. It supports text and image inputs with text output, and i...

Mistral: Mistral Small 3.1 24B

24B
by Mistral AI · 128,000 ctx

Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 billion parameters with advanced multimodal...

Mistral: Mistral Small 3.2 24B

24B
by Mistral AI · 128,000 ctx

Mistral-Small-3.2-24B-Instruct-2506 is an updated 24B parameter model from Mistral optimized for instruction following, repetition reduct...

Mistral: Mistral Small 4

multimodal
by Mistral AI · 262,144 ctx

Mistral Small 4 is the next major release in the Mistral Small family, unifying the capabilities of several flagship Mistral models into ...

Mistral: Pixtral Large 2411

multimodal
by Mistral AI · 131,072 ctx

Pixtral Large is a 124B parameter, open-weight, multimodal model built on top of [Mistral Large 2](/mistralai/mistral-large-2411). The mo...

Mistral-Small-3.2-24B-Instruct-2506

24B
by Mistral AI · 128,000 ctx

Mistral-Small-3.2-24B-Instruct is a drop-in upgrade over the 3.1 release, with markedly better instruction following, roughly half the in...

MoonshotAI Kimi Latest

1000B
by Moonshot AI · 262,144 ctx

This model always redirects to the latest model in the MoonshotAI Kimi family.

Nemotron-3-Nano-Omni-30B-A3B-Reasoning

30B
by Nvidia · 262,144 ctx

Nemotron 3 Nano Omni is an open multimodal model built on a hybrid Mixture-of-Experts (MoE) architecture, engineered for high efficiency ...

Perceptron: Perceptron Mk1

multimodal
by Perceptron · 32,768 ctx

Perceptron Mk1 (Mark One) is Perceptron's highest-quality vision-language model for video and embodied reasoning.** It accepts image and ...

Qwen3.5-0.8B

8B
by Alibaba (Qwen Team) · 262,144 ctx

Qwen3.5-0.8B is Alibaba's smallest model in the Qwen3.5 series, featuring a hybrid Gated Delta Networks and sparse Mixture-of-Experts arc...

Qwen3.5-2B

2B
by Alibaba (Qwen Team) · 262,144 ctx

Qwen3.5-2B is a compact yet capable model from Alibaba's Qwen3.5 series. It features a 262K token context window, support for 201 languag...

Qwen3.5-4B

4B
by Alibaba (Qwen Team) · 262,144 ctx

Qwen3.5-4B is a mid-size model from Alibaba's Qwen3.5 series that delivers a strong balance of performance and efficiency. It features a ...

Qwen: Qwen2.5 VL 72B Instruct

72B
by Alibaba (Qwen Team) · 131,072 ctx

Qwen2.5-VL is proficient in recognizing common objects such as flowers, birds, fish, and insects. It is also highly capable of analyzing ...

Qwen: Qwen3.5-122B-A10B

122B
by Alibaba (Qwen Team) · 262,144 ctx

The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a ...

Qwen: Qwen3.5-27B

27B
by Alibaba (Qwen Team) · 262,144 ctx

The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balanc...

Qwen: Qwen3.5-35B-A3B

35B
by Alibaba (Qwen Team) · 262,144 ctx

The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture that integrates linear attention mechani...

Qwen: Qwen3.5 397B A17B

397B
by Alibaba (Qwen Team) · 262,144 ctx

The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism ...

Qwen: Qwen3.5-9B

9B
by Alibaba (Qwen Team) · 262,144 ctx

Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, designed to deliver strong reasoning, coding, and visual understandi...

Qwen: Qwen3.5-Flash

multimodal
by Alibaba (Qwen Team) · 1,000,000 ctx

The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sp...

Qwen: Qwen3.5 Plus 2026-02-15

multimodal
by Alibaba (Qwen Team) · 1,000,000 ctx

The Qwen3.5 native vision-language series Plus models are built on a hybrid architecture that integrates linear attention mechanisms with...

Qwen: Qwen3.5 Plus 2026-04-20

235B
by Alibaba (Qwen Team) · 1,000,000 ctx

Qwen3.5 Plus (April 2026) is a large-scale multimodal language model from Alibaba. It accepts text, image, and video input and produces t...

Qwen: Qwen3.6 27B

27B
by Alibaba (Qwen Team) · 262,144 ctx

Qwen3.6 27B is a dense 27-billion-parameter language model from the Qwen Team at Alibaba, released in April 2026. It features hybrid mult...

Qwen: Qwen3.6 35B A3B

35B
by Alibaba (Qwen Team) · 262,144 ctx

Qwen3.6-35B-A3B is an open-weight multimodal model from Alibaba Cloud with 35 billion total parameters and 3 billion active parameters pe...

Qwen: Qwen3.6 Flash

multimodal
by Alibaba (Qwen Team) · 1,000,000 ctx

Qwen3.6 Flash is a fast, efficient language model from Alibaba's Qwen 3.6 series. It supports text, image, and video input with a 1M toke...

Qwen: Qwen3.6 Plus

multimodal
by Alibaba (Qwen Team) · 1,000,000 ctx

Qwen 3.6 Plus builds on a hybrid architecture that combines efficient linear attention with sparse mixture-of-experts routing, enabling s...

Qwen: Qwen3 VL 235B A22B Instruct

235B
by Alibaba (Qwen Team) · 262,144 ctx

Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text generation with visual understanding across image...

Qwen: Qwen3 VL 235B A22B Thinking

235B
by Alibaba (Qwen Team) · 131,072 ctx

Qwen3-VL-235B-A22B Thinking is a multimodal model that unifies strong text generation with visual understanding across images and video. ...

Qwen: Qwen3 VL 30B A3B Instruct

30B
by Alibaba (Qwen Team) · 262,144 ctx

Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its ...

Qwen: Qwen3 VL 30B A3B Thinking

30B
by Alibaba (Qwen Team) · 131,072 ctx

Qwen3-VL-30B-A3B-Thinking is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its ...

Qwen: Qwen3 VL 32B Instruct

32B
by Alibaba (Qwen Team) · 262,144 ctx

Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across te...

Qwen: Qwen3 VL 8B Instruct

8B
by Alibaba (Qwen Team) · 256,000 ctx

Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for high-fidelity understanding and reasoning ...

Qwen: Qwen3 VL 8B Thinking

8B
by Alibaba (Qwen Team) · 256,000 ctx

Qwen3-VL-8B-Thinking is the reasoning-optimized variant of the Qwen3-VL-8B multimodal model, designed for advanced visual and textual rea...

Reka Edge

7B
by Rekaai · 16,384 ctx

Reka Edge is an extremely efficient 7B multimodal vision-language model that accepts image/video+text inputs and generates text outputs. ...

Seed-1.8

200B
by Bytedance · 256,000 ctx

Optimized specifically for multimodal agent scenarios. It features enhanced agent capabilities, upgraded multimodal comprehension, and mo...

Seed-2.0-code

multimodal
by Bytedance · 256,000 ctx

A coding model optimized for real-world development environments, with reliable tool use in common IDEs such as Claude Code. It delivers ...

Seed-2.0-pro

multimodal
by Bytedance · 256,000 ctx

Built for the Agent era, it delivers stable performance in complex reasoning and long-horizon tasks, including multi-step planning, visua...

Xiaomi: MiMo-V2.5

multimodal
by Xiaomi · 1,048,576 ctx

MiMo-V2.5 is a native omnimodal model by Xiaomi. It delivers Pro-level agentic performance at roughly half the inference cost, while surp...

Xiaomi: MiMo-V2-Omni

multimodal
by Xiaomi · 262,144 ctx

MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It comb...

Z.ai: GLM 4.5V

9B
by Zhipu AI · 65,536 ctx

GLM-4.5V is a vision-language foundation model for multimodal agent applications. Built on a Mixture-of-Experts (MoE) architecture with 1...

Z.ai: GLM 4.6V

multimodal
by Zhipu AI · 131,072 ctx

GLM-4.6V is a large multimodal model designed for high-fidelity visual understanding and long-context reasoning across images, documents,...

Z.ai: GLM 5V Turbo

multimodal
by Zhipu AI · 202,752 ctx

GLM-5V-Turbo is Z.ai’s first native multimodal agent foundation model, built for vision-based coding and agent-driven tasks. It natively ...