Serverless specialty US

Ollama.

Local-first runtime for open-weight LLMs — run them on your own machine or rented GPU. Library indexes Llama, Kimi, GLM, Qwen, …

At a glance

Service type
Serverless specialty
Trust tier
Tier 2
Headquarters
US
OpenAI-compat
No
Open weights
Yes
Proprietary
No

When to pick Ollama

Best for

  • Ultra-low-latency inference (Groq's LPU silicon, Cerebras).
  • Image / video / audio generation via per-second billing.
  • Workloads where the specialty's hardware advantage outweighs cost.

Avoid for

  • General LLM workloads where a generalist aggregator is cheaper.
  • Workloads needing feature parity across many models.

Models on Ollama

Pricing + measured speed + self-host alternative, one row per model. Click a column header to sort.

44 models · 0 benchmarked
Model ↕ Maker ↕ Access ↕ $/M in ↕ $/M out ↕ Tokens/sec ↕ TTFT ↕ Self-host on ↕
Kimi K2 Thinking Moonshot AI self hosted 4× AMD MI300 · INT4 Open →
Kimi K2.5 Moonshot AI self hosted 4× AMD MI300 · INT4 Open →
Kimi K2.6 Moonshot AI self hosted 4× AMD MI300 · INT4 Open →
Llama 3.1 405B Meta AI self hosted 1× AMD MI325 · INT4 Open →
Qwen 2.5 72B Alibaba (Qwen Team) self hosted 1× Nvidia L40S · INT4 Open →
Qwen 2.5 Coder 32B Alibaba (Qwen Team) self hosted 1× Nvidia RTX A5000 · INT4 Open →
Llama 3.1 8B Meta AI self hosted 1× Nvidia P102-100 · INT4 Open →
Llama 3.1 70B Meta AI self hosted 1× Nvidia L40S · INT4 Open →
Llama 3.2 1B Meta AI self hosted 1× Nvidia Titan V · FP8 Open →
Llama 3.2 3B Meta AI self hosted 1× Nvidia Titan V · INT4 Open →
Qwen 2.5 7B Alibaba (Qwen Team) self hosted 1× Nvidia P102-100 · INT4 Open →
Qwen 2.5 14B Alibaba (Qwen Team) self hosted 1× Nvidia RTX 3080 · INT4 Open →
Qwen 2.5 32B Alibaba (Qwen Team) self hosted 1× Nvidia RTX A5000 · INT4 Open →
Qwen 2.5 3B Alibaba (Qwen Team) self hosted 1× Nvidia GeForce GTX 1050 · INT4 Open →
Qwen 3 235B Alibaba (Qwen Team) self hosted 1× AMD MI300 · INT4 Open →
Qwen 3 32B Alibaba (Qwen Team) self hosted 1× Nvidia RTX A5000 · INT4 Open →
Qwen 3 14B Alibaba (Qwen Team) self hosted 1× Nvidia RTX 3080 · INT4 Open →
Qwen 3 8B Alibaba (Qwen Team) self hosted 1× Nvidia GeForce RTX 2060 · INT4 Open →
Qwen 3 4B Alibaba (Qwen Team) self hosted 1× Nvidia Titan V · INT4 Open →
Gemma 2 27B Google DeepMind self hosted 1× Nvidia RTX 4000 Ada · INT4 Open →
Gemma 2 9B Google DeepMind self hosted 1× Nvidia GeForce RTX 2060 · INT4 Open →
Gemma 2 2B Google DeepMind self hosted 1× Nvidia GeForce GTX 1050 · INT4 Open →
Gemma 3 12B Google DeepMind self hosted 1× Nvidia GTX 1070 Ti · INT4 Open →
Gemma 3 4B Google DeepMind self hosted 1× Nvidia Titan V · INT4 Open →
Gemma 3 1B Google DeepMind self hosted 1× Nvidia Titan V · FP16 Open →
LLaVA 34B LLaVA Project self hosted 1× Nvidia RTX A5000 · INT4 Open →
LLaVA 13B LLaVA Project self hosted 1× Nvidia RTX 3080 · INT4 Open →
LLaVA 7B LLaVA Project self hosted 1× Nvidia P102-100 · INT4 Open →
Code Llama 70B Meta AI self hosted 1× Nvidia L40S · INT4 Open →
Code Llama 34B Meta AI self hosted 1× Nvidia RTX A5000 · INT4 Open →
Code Llama 13B Meta AI self hosted 1× Nvidia RTX 3080 · INT4 Open →
Code Llama 7B Meta AI self hosted 1× Nvidia P102-100 · INT4 Open →
DeepSeek Coder V2 236B DeepSeek self hosted 1× AMD MI300 · INT4 Open →
DeepSeek Coder V2 Lite DeepSeek self hosted 1× Nvidia RTX 3080 · INT4 Open →
DeepSeek Coder 33B DeepSeek self hosted 1× Nvidia RTX A5000 · INT4 Open →
Mistral Nemo 12B Mistral AI self hosted 1× Nvidia GTX 1070 Ti · INT4 Open →
GPT-OSS 120B OpenAI self hosted 1× Nvidia H100 · INT4 Open →
GPT-OSS 20B OpenAI self hosted 1× Nvidia GeForce RTX 4080 · INT4 Open →
IBM Granite Code 8B IBM Research self hosted 1× Nvidia P102-100 · INT4 Open →
Hermes 3 70B Nous Research self hosted 1× Nvidia L40S · INT4 Open →
Hermes 3 8B Nous Research self hosted 1× Nvidia P102-100 · INT4 Open →
OLMo 3 7B Allen Institute for AI (AI2) self hosted 1× Nvidia P102-100 · INT4 Open →
Gemma 3 27B Google DeepMind self hosted 1× Nvidia RTX 4000 Ada · INT4 Open →
Llama 3.3 70B Meta AI self hosted 1× Nvidia L40S · INT4 Open →