by Cohere

Command R+.

text open weights datacenter 104B params 128K ctx Transformer Quality 72.2
Cheapest input
$2.5/M
on Cohere API
Cheapest output
$10.0/M
on Cohere API
Fastest
32 tok/s
on OpenRouter
Smallest GPU
1× Nvidia A100
$0.48/hr

Cohere's open-weight RAG-optimized LLM — multilingual + tool use.

Smallest GPU to run it See all quantisations →

1× Nvidia A100 · $0.48/hr.

Most-aggressive quantisation we have a working recommendation for. Lower precision = less VRAM = cheaper hardware, at a small accuracy cost.

Where to use it

Cheapest hosted endpoints.

Provider Access $/M in $/M out
Cohere API api direct $2.5 $10.0 Launch ↗
Capability snapshot Full benchmarks →

What it's best at.

General knowledge 74.6
Coding 70.7
Performance

Speed across providers.

Tokens/sec and time-to-first-token measured against the same prompt template on each provider's API.

Provider Tokens/sec TTFT Total
OpenRouter 31.9 855 ms 3979 ms
Sources

Official references.

Best for

Workloads.

FAQ

Frequently asked.

How do I run Command R+?
Command R+ is open-weight, so you can self-host on rented GPUs. See the Run It Yourself tab for GPU configurations + cost estimates, or use one of the hosted inference providers listed on this page.
Where can I access Command R+?
Command R+ is available via Cohere API, OpenRouter. Each access option lists its own pricing (per million tokens or hourly hosting).
How much does it cost to run Command R+?
API pricing starts at $2.5/M input tokens and $10.0/M output tokens. Self-hosting cost depends on the GPU you rent — see the Run It Yourself tab.
Is Command R+ open-source or proprietary?
Command R+ is open-weight under the CC-BY-NC-4.0 license. You can download and self-host it.