Kimi K2.5.
Multimodal agentic variant — adds a vision encoder to the K2 backbone.
4× AMD MI300.
Most-aggressive quantisation we have a working recommendation for. Lower precision = less VRAM = cheaper hardware, at a small accuracy cost.
Cheapest hosted endpoints.
Speed across providers.
Tokens/sec and time-to-first-token measured against the same prompt template on each provider's API.
| Provider | Tokens/sec | TTFT | Total |
|---|---|---|---|
| OpenRouter | 68.2 | — | 7333 ms |
Variants in the Kimi family.
Moonshot's frontier open-weight MoE — 1T total, 32B active.
Moonshot's open-weight reasoning variant — extended chain-of-thought training...
Long-horizon coding + autonomous-execution upgrade over K2.5.
Frequently asked.
How do I run Kimi K2.5?
Where can I access Kimi K2.5?
How much does it cost to run Kimi K2.5?
Is Kimi K2.5 open-source or proprietary?
Cheapest hardware per quantisation.
Each row is one quantisation tier (the same weights compressed differently). Lower precision → lower VRAM → cheaper hardware, at the cost of small accuracy loss. $/hr refreshed hourly from each provider's API.
| Quantisation | Cheapest GPU config | Total VRAM | Live $/hr | tokens/sec | |
|---|---|---|---|---|---|
|
FP8
FP8 — 8-bit float (Hopper / Blackwell)
|
1536 GB | — | — | Compare → | |
|
INT4
INT4 — 4-bit integer (~4× VRAM saving)
|
768 GB | — | — | Compare → |
What it costs per month across providers.
Estimate your monthly bill for Kimi K2.5 across every host that publishes per-token pricing. Slide your token volumes; the chart + table re-rank cheapest-first.
Cheapest provider on the left.
Total monthly cost — input + output tokens combined.
Bill breakdown.
About Kimi K2.5.
Kimi K2.5 is Moonshot AI's multimodal expansion of Kimi K2 — same Mixture-of-Experts language backbone plus a vision encoder for image understanding. Targets agentic workflows that need to interpret screenshots, diagrams, and document images mid-task. Open-weight under modified MIT; available on Hugging Face and via Moonshot's platform. Adds image inputs to the same 256K text context window. Native function calling and tool use carry over from K2.
How it's built.
How much it can remember.
What it can do.
Every place this model is hosted.
Self-hosted on rented GPU cluster
self hostedMultimodal MoE — needs additional VRAM headroom for the vision encoder.
Moonshot AI Platform
api directKimi.com
chat uiConsumer chat with image upload support.