Deep Cogito: Cogito v2.1 671B.
Cogito v2.1 671B MoE represents one of the strongest open models globally, matching performance of frontier closed and open models. This model is trained using self play with reinforcement learning...
2× AMD MI325.
Most-aggressive quantisation we have a working recommendation for. Lower precision = less VRAM = cheaper hardware, at a small accuracy cost.
Cheapest hosted endpoints.
Speed across providers.
Tokens/sec and time-to-first-token measured against the same prompt template on each provider's API.
| Provider | Tokens/sec | TTFT | Total |
|---|---|---|---|
| OpenRouter | 41.7 | 717 ms | 2638 ms |
Frequently asked.
How do I run Deep Cogito: Cogito v2.1 671B?
Where can I access Deep Cogito: Cogito v2.1 671B?
How much does it cost to run Deep Cogito: Cogito v2.1 671B?
Is Deep Cogito: Cogito v2.1 671B open-source or proprietary?
Cheapest hardware per quantisation.
Each row is one quantisation tier (the same weights compressed differently). Lower precision → lower VRAM → cheaper hardware, at the cost of small accuracy loss. $/hr refreshed hourly from each provider's API.
| Quantisation | Cheapest GPU config | Total VRAM | Live $/hr | tokens/sec | |
|---|---|---|---|---|---|
|
FP16
FP16 — half precision (default)
|
2048 GB | — | — | Compare → | |
|
FP8
FP8 — 8-bit float (Hopper / Blackwell)
|
1024 GB | — | — | Compare → | |
|
INT4
INT4 — 4-bit integer (~4× VRAM saving)
|
512 GB | — | — | Compare → |
What it costs per month across providers.
Estimate your monthly bill for Deep Cogito: Cogito v2.1 671B across every host that publishes per-token pricing. Slide your token volumes; the chart + table re-rank cheapest-first.
Cheapest provider on the left.
Total monthly cost — input + output tokens combined.