Kimi K2.6.
Long-horizon coding + autonomous-execution upgrade over K2.5.
4× AMD MI300.
Most-aggressive quantisation we have a working recommendation for. Lower precision = less VRAM = cheaper hardware, at a small accuracy cost.
Cheapest hosted endpoints.
Speed across providers.
Tokens/sec and time-to-first-token measured against the same prompt template on each provider's API.
| Provider | Tokens/sec | TTFT | Total |
|---|---|---|---|
| OpenRouter | 18.5 | — | 26988 ms |
Variants in the Kimi family.
Moonshot's frontier open-weight MoE — 1T total, 32B active.
Moonshot's open-weight reasoning variant — extended chain-of-thought training...
Multimodal agentic variant — adds a vision encoder to the K2 backbone.
Frequently asked.
How do I run Kimi K2.6?
Where can I access Kimi K2.6?
How much does it cost to run Kimi K2.6?
Is Kimi K2.6 open-source or proprietary?
Cheapest hardware per quantisation.
Each row is one quantisation tier (the same weights compressed differently). Lower precision → lower VRAM → cheaper hardware, at the cost of small accuracy loss. $/hr refreshed hourly from each provider's API.
| Quantisation | Cheapest GPU config | Total VRAM | Live $/hr | tokens/sec | |
|---|---|---|---|---|---|
|
FP8
FP8 — 8-bit float (Hopper / Blackwell)
|
1536 GB | — | — | Compare → | |
|
INT4
INT4 — 4-bit integer (~4× VRAM saving)
|
768 GB | — | — | Compare → |
What it costs per month across providers.
Estimate your monthly bill for Kimi K2.6 across every host that publishes per-token pricing. Slide your token volumes; the chart + table re-rank cheapest-first.
Cheapest provider on the left.
Total monthly cost — input + output tokens combined.
Bill breakdown.
About Kimi K2.6.
Kimi K2.6 is Moonshot AI's coding-and-agent focused upgrade — same MoE base as K2/K2.5 but trained with stronger emphasis on long-horizon coding workflows and autonomous task execution (multi-turn tool use, file-system manipulation, test-driven refinement). Tops SWE-bench-style leaderboards at the open-weight tier when this seed was authored. Open-weight under modified MIT. Available on Hugging Face, Moonshot's platform, Kimi.com (consumer), Ollama, SiliconFlow, and Together AI.
How it's built.
How much it can remember.
What it can do.
Every place this model is hosted.
Self-hosted on rented GPU cluster
self hostedMulti-GPU MoE deployment.
Moonshot AI Platform
api directKimi.com
chat uiConsumer chat surface.